Open source, by Goldsky

Columnar streaming with native-speed plugins

Name: Streamling
Author: Goldsky

A streaming runtime built in Rust on Arrow and DataFusion. Write plugins that run on the same zero-copy data plane as the built-ins. Define pipelines in YAML and ship in seconds.

Star on GitHub

pipeline.yaml

sources:
  raw.transactions:
    type: kafka
    topic: raw.event.transaction

transforms:
  large_transactions:
    type: sql
    primary_key: id
    sql: |
      SELECT *
      FROM raw.transactions
      WHERE amount > 1000

sinks:
  pg.large_transactions:
    from: large_transactions
    type: postgres
    schema: public
    table: large_transactions
    primary_key: id

Why Streamling

Extend without forking

Extending a typical engine means maintaining a custom build, or running your logic out of process and paying a serialization round trip on every message. Streamling loads your Rust as a native DataFusion operator. It reads the same Arrow batches as the built-ins, and the runtime wraps it in checkpointing and backpressure.

Write once, run in every pipeline

A plugin you build for one pipeline (a Postgres enrichment lookup, a partner-API sink) drops into any other by id. Teams build a library of domain operators, and the runtime treats them like natives.

Write transforms in TypeScript or SQL

WebAssembly-sandboxed JS/TS transforms run a process(input) function per record. Or use full DataFusion SQL with hundreds of built-in functions.

HTTP endpoints are first-class

Most app data lives behind APIs. Streamling uses HTTP for enrichment transforms and webhook sinks, not just Kafka.

Write to the databases you already use

Postgres, ClickHouse, and Kafka are core sinks. Schema and tables are created automatically, and upsert semantics are tracked through the engine.

Single binary

No coordinator or ZooKeeper. Tens of thousands of messages per second on 0.5 of a CPU core.

Architecture

Pragmatic by design

Streamling makes opinionated trade-offs so application teams don't have to babysit infrastructure. Less to learn. Less to operate.

COLUMNAR, ZERO-COPY ARROW

One data plane

Every operator, built-in or plugin, is a pull-based DataFusion ExecutionPlan passing Arrow RecordBatches over Tokio streams. One representation end to end is why SQL, TypeScript, HTTP handlers, and native plugins compose freely, and why you get tens of thousands of messages per second on half a CPU core.

BUILT ON DATAFUSION

Single-node execution

No distributed shuffle or coordinator. Scale out horizontally with Kafka consumer groups. Each instance picks up its own partitions, naturally.

STATE BACKEND OPTIONAL

Stateless

Operators are stateless by default. A pluggable state backend stores small bits of metadata, like Kafka offsets, in SQLite or Postgres.

CHANDY–LAMPORT CHECKPOINTS

At-least-once delivery

Markers flow source → transforms → sinks. Sinks flush before sources commit. Upsert-aware sinks make redelivery harmless.

LIVE LOOKUPS, NO STATE

Dynamic tables, not joins

Need to filter against a list of IDs? Point a SQL transform at a Postgres-backed dynamic table. Update the lookup data without restarting the pipeline.

BUILD → VALIDATE → RUN

Delightful DX

Live inspection of in-flight data. Print and blackhole sinks for debugging. Instant startup. OpenTelemetry built in. validate mode for CI and agentic tools.

Connectors

Snap together the things you already use

Kafka, Postgres, ClickHouse, HTTP, SQL, TypeScript. These are just the built-in connectors, more are available as plugins.

Sources

Kafka

JSON or Avro + Schema Registry. Consumer groups. Tracked offsets.

kafka.yaml

sources:
  ethereum.raw_blocks:
    type: kafka
    topic: mainnet.raw.blocks
    # json or avro supported
    data_format: avro 
    # Offsets committed only after sinks flush.

ClickHouse

Keyset pagination over sorting keys. Adaptive block ranges.

File

Local paths or S3/GCS. CSV, JSON, Parquet, Avro. Bounded or polling.

Transforms

SQL

Full DataFusion SQL. Projections, filters, UDFs.

TypeScript / WASM

Sandboxed JS/TS via Extism. One process(input) function.

HTTP handler

Enrich records via your API. Batched, retried.

Dynamic tables

Live, externally-updatable lookup tables.

Sinks

Postgres

Auto-created tables. U256/I256 support. Upserts.

Postgres aggregate

Real-time aggregations via DB triggers.

ClickHouse

Auto-created tables. ReplacingMergeTree upserts. Gzip compression.

Kafka

Avro + Schema Registry. Operation headers.

Webhook

Push records to any HTTP endpoint as JSON.

Stdout JSON. Perfect for local dev.

S3Plugin

Parquet files to S3-compatible storage. Optional Hive partitioning.

MySQLPlugin

Upsert + delete CDC. Auto-created tables.

SQSPlugin

Each row as a JSON message. Batched, retried.

kafka.yaml

sources:
  ethereum.raw_blocks:
    type: kafka
    topic: mainnet.raw.blocks
    # json or avro supported
    data_format: avro 
    # Offsets committed only after sinks flush.

Built-ins cover the common cases. Everything else is a plugin, and plugins run at native speed with the same guarantees.Write one

Plugin system

Your code is a first-class operator

Write a plugin for what's specific to you: read and enrich from Postgres, poll a partner API, push to your warehouse. Reuse it across every pipeline by id. Sources, transforms, sinks, UDFs, and topology preprocessors all share one trait.

Plugins load from a shared library at startup and run on the zero-copy data plane, not in a subprocess behind a serialization hop. The runtime hands your code the same Arrow batches and the same guarantees as the built-ins: backpressure, checkpoint markers, commit ordering, upsert propagation. The S3, MySQL, and SQS sinks above are plugins.

Stable ABI via abi_stable

Arrow RecordBatch over FFI

Backpressure handled for you

OpenTelemetry metrics built in

Async runtime provided

State backend access

Plugin docs Contribute to Community Plugins

sqs_sink.rs

use streamling_plugin::{register_plugin_sink, SinkPlugin};
use streamling_core::{Result, PluginError, CheckpointEpoch};
use arrow::record_batch::RecordBatch;

pub struct SqsSink { /* ... */ }

#[async_trait]
impl SinkPlugin for SqsSink {
    async fn initialize(&self) -> Result<(), PluginError> {
        // open connection, register schema, etc.
        Ok(())
    }

    async fn process_batch(&self, data: RecordBatch) -> Result<(), PluginError> {
        // your sink logic — react to the incoming RecordBatch
        self.client.send_batch(data).await?;
        Ok(())
    }

    async fn process_checkpoint_marker(&self, epoch: CheckpointEpoch)
        -> Result<(), PluginError> { /* prepare */ Ok(()) }

    async fn process_checkpoint_finalizer(&self, epoch: CheckpointEpoch)
        -> Result<(), PluginError> { /* commit  */ Ok(()) }
}

register_plugin_sink!("sqs", SqsSink);

In production

The engine behind Turbo Pipelines at Goldsky

Streamling has run in production for months as the engine for Goldsky's flagship product. It powers thousands of pipelines moving blockchain data for some of the largest teams in crypto.

1000s

of production pipelines

150+

chains streamed in real time

Streamling powers Turbo, our newest product, which in turn powers products at Polymarket, Stripe, Phantom, and many more. Since launch, teams have built thousands of pipelines on it.

Jeff Ling

Co-founder & CTO, Goldsky

GOLDSKY

Ship a streaming pipeline before lunch

Run it locally against a print sink, then point the same YAML at Postgres in production.

Read on GitHub