Ibex | Fast Typed DataFrame Pipelines

Performance

Single-core speed that still shows up at scale

Ibex is currently single-threaded, yet stays competitive with engines using all cores on many common columnar queries.

Language

DataFrame work without stringly SQL glue

Named clauses describe each transformation, column references are checked statically, and type errors are caught before the pipeline runs.

Deployment

One pipeline, several surfaces

Use the REPL for exploration, notebooks for analysis, plugins for I/O, and C++23 codegen for native binaries.

Performance

Speed, measured

Ibex is built for the expensive part of table work: grouped aggregation, rolling time windows, joins, filters, null handling, and reshaping. The benchmark suite compares each query against the same operation in Polars, DuckDB, ClickHouse, DataFusion, pandas, data.table, and dplyr.

33.6 ms

mean by symbol, 16M rows

Polars: 60.5 ms. Polars single-threaded: 216 ms.

30.6 ms

count by symbol x day, 16M rows

Polars: 220 ms. DuckDB: 77.6 ms. DataFusion: 58.1 ms.

49.7 ms

rolling sum 1m, 16M rows

Polars: 152 ms. Polars single-threaded: 283 ms.

Explore all benchmark rows Compare query syntax

Ibex currently runs on one thread. The benchmark page shows both default engine settings and single-threaded Polars for a same-core comparison.

How a program is shaped

Everything is a table flowing through a pipeline

An Ibex program is a handful of let bindings. Each one names a table and the steps applied to it. There are no loops or mutable variables — you describe the transformation you want and Ibex runs it.

Pipelines

A table, followed by the steps to apply to it

Write a table name, then square brackets containing a comma-separated list of clauses — one operation each. The clauses run in the order you read them, each taking the whole table and producing a new one.

Tables are values: a pipeline returns a new table and leaves its input alone. You can name the result with let, or feed it straight into another set of brackets.

// Keep the busy rows, then three columns
prices[
    filter volume > 1000,
    select { symbol, price, volume }
];

// Name a result and reuse it
let active = prices[filter volume > 1000];
active[select { symbol, price }];

Reading the syntax

Three kinds of bracket, one job each

Ibex uses square brackets, braces, and parentheses for three distinct things. Knowing which is which is most of what it takes to read any snippet.

`[ ]` — a pipeline

Square brackets attach to a table and hold a list of clauses to apply: prices[filter …, select …]. Chaining […][…] just feeds one result into the next.

`{ }` — a list of fields

Braces hold the named members a clause works on — the output columns of select, the sort keys of order, the columns of a schema. Think struct fields, not a code block: { avg = mean(px), n = count() }. A schema ascription such as rows as { px: Float64 } requires px, permits other physical columns, and exposes only px to subsequent static checks.

`( )` — calls and grouping

Parentheses are the familiar kind: calling a function and grouping arithmetic. mean(price), (close - open) / open. The expressions inside clauses are ordinary too — comparisons, math, function calls.

A worked example

One pipeline, start to finish

A common task — collapse tick data into daily bars per symbol — shows how the pieces fit together.

Step by step

Group by symbol, reduce each group to one row

select chooses the output columns. Each entry is name = expression; a bare name passes a column through unchanged.

by symbol groups the rows, so the aggregates in select — first, max, min, last — run once per symbol. Drop the by and they would collapse the whole table to a single row instead.

order then sorts the result. The whole thing is one expression, bound to bars.

let bars = ticks[
    select {
        open  = first(price),
        high  = max(price),
        low   = min(price),
        close = last(price),
        vol   = sum(size)
    },
    by symbol,
    order symbol
];

The vocabulary

The core clauses

These drop inside [ ] and compose in any sensible order. The function reference and cheat sheet have the rest.

`filter predicate`	Keep rows where the predicate is true
`select { fields }`	Choose or compute output columns; aggregates when paired with `by`
`update { fields }`	Add or replace columns, keeping all existing ones
`where predicate update { fields }`	Replace columns in selected rows
`by key`	Group for `select` / `update` (like SQL `GROUP BY` / `PARTITION BY`)
`order { keys }`	Sort, with per-key `asc` / `desc`
`rename { map }`	Relabel columns without touching data
`distinct { keys }`	Deduplicate on one or more columns
`head n` / `tail n`	Keep the first / last n rows (per group with `by`)
`a join b on key`	Inner / left / right / outer / semi / anti / cross / as-of joins
`window duration`	Lookback window for rolling aggregates on a `TimeFrame` (e.g. `window 5m`)
`resample duration`	Bucket a `TimeFrame` into fixed time intervals, then aggregate per bucket (e.g. 1m OHLC)

Outside the pipelines are a few top-level forms: let bindings, import to load a plugin, fn / extern fn for reusable functions and data sources, and Table { … } to build a table from literals.

Install & run

Get Ibex on your machine

A prebuilt release is the quickest start. Build from source if you want the latest commits or a binary for your own platform.

Option A — download a release

Grab the prebuilt ibex REPL and bundled plugins, unpack, and run — no toolchain required.

github.com/bobjansen/Ibex/releases ↗

# Unpack the archive for your platform, then:
./ibex --plugin-path ./plugins

Option B — build from source

Requirements: CMake 3.26+ and a C++23 compiler such as Clang 17+, GCC 13+, AppleClang, or MSVC 2022. Ninja is recommended on Linux and macOS; CMake's Visual Studio generator works on Windows.

# Linux/macOS with Clang or GCC
cmake -B build-release -G Ninja \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DCMAKE_BUILD_TYPE=Release
cmake --build build-release

# Windows, from a Developer PowerShell
cmake -B build-release -DCMAKE_BUILD_TYPE=Release
cmake --build build-release --config Release

Run your first pipeline

./ibex --plugin-path ./plugins   # from source: ./build-release/tools/ibex --plugin-path ./build-release/tools

import "csv";
let prices = read_csv("prices.csv");

// Five most-traded symbols by total volume
prices[
    select { traded = sum(volume) }, by symbol,
    order { traded desc },
    head 5
];

Handy REPL commands: :load <file.ibex>, :schema <table>, :head <table> [n], :doc <name>, :help.

Where to go next

Typed table pipelines, from REPL to native code

Single-core speed that still shows up at scale

DataFrame work without stringly SQL glue

One pipeline, several surfaces

Speed, measured

mean by symbol, 16M rows

count by symbol x day, 16M rows

rolling sum 1m, 16M rows

Everything is a table flowing through a pipeline

A table, followed by the steps to apply to it

Three kinds of bracket, one job each

`[ ]` — a pipeline

`{ }` — a list of fields

`( )` — calls and grouping

One pipeline, start to finish

Group by symbol, reduce each group to one row

The core clauses

Get Ibex on your machine

Keep going

Getting started

Benchmarks

Comparison

Reference

I/O guide

Function reference

Cheat sheet

Typed table pipelines, from REPL to native code

Single-core speed that still shows up at scale

DataFrame work without stringly SQL glue

One pipeline, several surfaces

Speed, measured

mean by symbol, 16M rows

count by symbol x day, 16M rows

rolling sum 1m, 16M rows

Everything is a table flowing through a pipeline

A table, followed by the steps to apply to it

Three kinds of bracket, one job each

[ ] — a pipeline

{ } — a list of fields

( ) — calls and grouping

One pipeline, start to finish

Group by symbol, reduce each group to one row

The core clauses

Get Ibex on your machine

Keep going

Getting started

Benchmarks

Comparison

Reference

I/O guide

Function reference

Cheat sheet

`[ ]` — a pipeline

`{ }` — a list of fields

`( )` — calls and grouping