Open source, by Goldsky

Streaming data, built for transactional workloads.

A performant, extensible streaming runtime written in Rust using Apache Arrow and Apache DataFusion. Define pipelines in YAML. Hit production in seconds.

Star on GitHub
pipeline.yaml
sources:
  raw.transactions:
    type: kafka
    topic: raw.event.transaction

transforms:
  large_transactions:
    type: sql
    primary_key: id
    sql: |
      SELECT *
      FROM raw.transactions
      WHERE amount > 1000

sinks:
  pg.large_transactions:
    from: large_transactions
    type: postgres
    schema: public
    table: large_transactions
    primary_key: id
Why Streamling

Most streaming engines are built for data engineers. Streamling is built for application teams.

Streamling skips distributed shuffles, complex joins, and stateful aggregations. Instead it offers a streaming engine for application developers connecting datastores, enriching events with HTTP, and powering user-facing products.

Write transforms in TypeScript or SQL

WebAssembly-sandboxed JS/TS transforms run a process(input) function per record. Or use full DataFusion SQL with hundreds of built-in functions.

HTTP endpoints are first-class

Most app data lives behind APIs. Streamling uses HTTP for enrichment transforms and webhook sinks, not just Kafka.

Write to the databases you already use

Postgres, ClickHouse, and Kafka are core sinks. Schema and tables are created automatically; upsert semantics are tracked through the engine.

Single binary

No coordinator or ZooKeeper. Tens of thousands of messages per second on 0.5 of a CPU core.

Architecture

Pragmatic by design.

Streamling makes opinionated trade-offs so application teams don't have to babysit infrastructure. Less to learn. Less to operate.

BUILT ON DATAFUSION

Single-node execution

No distributed shuffle or coordinator. Scale out horizontally with Kafka consumer groups. Each instance picks up its own partitions, naturally.

STATE BACKEND OPTIONAL

Stateless

Operators are stateless by default. A pluggable state backend stores small bits of metadata, like Kafka offsets, in SQLite or Postgres.

CHANDY–LAMPORT CHECKPOINTS

At-least-once delivery

Markers flow source → transforms → sinks. Sinks flush before sources commit. Combined with sink-level upserts, you get effectively-once in most pipelines.

COLUMNAR, ZERO-COPY ARROW

Extreme efficiency

Tens of thousands of messages per second on half a CPU core. Each operator is a pull-based DataFusion ExecutionPlan, passing Arrow RecordBatch data over Tokio streams.

LIVE LOOKUPS, NO STATE

Dynamic tables, not joins

Need to filter against a list of IDs? Point a SQL transform at a Postgres-backed dynamic table. Update the lookup data without restarting the pipeline.

BUILD → VALIDATE → RUN

Delightful DX

Live inspection of in-flight data. Print and blackhole sinks for debugging. Instant startup. OpenTelemetry built in. validate mode for CI and agentic tools.

Connectors

Snap together the things you already use.

Kafka, Postgres, ClickHouse, HTTP, SQL, TypeScript. These are just the built-in connectors, more are available as plugins.

Sources
Kafka
Avro + Schema Registry. Consumer groups. Tracked offsets.
kafka.yaml
sources:
  ethereum.raw_blocks:
    type: kafka
    topic: mainnet.raw.blocks
    # Offsets committed only after sinks flush.
Transforms
Sinks
kafka.yaml
sources:
  ethereum.raw_blocks:
    type: kafka
    topic: mainnet.raw.blocks
    # Offsets committed only after sinks flush.
Need a different source, transform, or sink?Write a plugin
Plugin system

Extend everything. Without learning everything.

Sources, transforms, sinks, UDFs and topology preprocessors all share a single, simple trait. If the system you want to talk to has a Rust SDK, a plugin is a few dozen lines.

Stable ABI via abi_stable
Arrow RecordBatch over FFI
Backpressure handled for you
OpenTelemetry metrics built in
Async runtime provided
State backend access
sqs_sink.rs
use streamling_plugin::{register_plugin_sink, SinkPlugin};
use streamling_core::{Result, PluginError, CheckpointEpoch};
use arrow::record_batch::RecordBatch;

pub struct SqsSink { /* ... */ }

#[async_trait]
impl SinkPlugin for SqsSink {
    async fn initialize(&self) -> Result<(), PluginError> {
        // open connection, register schema, etc.
        Ok(())
    }

    async fn process_batch(&self, data: RecordBatch) -> Result<(), PluginError> {
        // your sink logic — react to the incoming RecordBatch
        self.client.send_batch(data).await?;
        Ok(())
    }

    async fn process_checkpoint_marker(&self, epoch: CheckpointEpoch)
        -> Result<(), PluginError> { /* prepare */ Ok(()) }

    async fn process_checkpoint_finalizer(&self, epoch: CheckpointEpoch)
        -> Result<(), PluginError> { /* commit  */ Ok(()) }
}

register_plugin_sink!("sqs", SqsSink);
In production

The engine behind Turbo Pipelines at Goldsky.

Streamling has run in production for months as the engine for Goldsky's flagship product. It powers thousands of pipelines moving blockchain data for some of the largest teams in crypto.

1000s
of production pipelines
150+
chains streamed in real time

Streamling powers Turbo, our newest product, which in turn powers products at Polymarket, Stripe, Phantom, and many more. Since launch, teams have built thousands of pipelines on it.

Jeff Ling
Jeff Ling
Co-founder & CTO, Goldsky
GOLDSKY

Ship a streaming pipeline before lunch.

One binary. YAML config. Run it against a print sink locally, swap in Postgres in production.

Read on GitHub