Ibex

Typed table pipelines, from REPL to native code

Ibex gives DataFrame pipelines their own compact, statically typed language. Explore in a REPL, embed the same code in Python or R notebooks, and compile it to C++23 when the pipeline needs to ship.

Typed table expressions REPL to C++23 Single-core performance focus
Performance

Single-core speed that still shows up at scale

Ibex is currently single-threaded, yet stays competitive with engines using all cores on many common columnar queries.

Language

DataFrame work without stringly SQL glue

Clauses compose left-to-right, columns are real names, and static types catch mistakes before a pipeline runs.

Deployment

One pipeline, several surfaces

Use the REPL for exploration, notebooks for analysis, plugins for I/O, and C++23 codegen for native binaries.

Speed, measured

Ibex is built for the expensive part of table work: grouped aggregation, rolling time windows, joins, filters, null handling, and reshaping. The benchmark suite compares each query against the same operation in Polars, DuckDB, ClickHouse, DataFusion, pandas, data.table, and dplyr.

33.6 ms

mean by symbol, 16M rows

Polars: 60.5 ms. Polars single-threaded: 216 ms.

30.6 ms

count by symbol x day, 16M rows

Polars: 220 ms. DuckDB: 77.6 ms. DataFusion: 58.1 ms.

49.7 ms

rolling sum 1m, 16M rows

Polars: 152 ms. Polars single-threaded: 283 ms.

Ibex currently runs on one thread. The benchmark page shows both default engine settings and single-threaded Polars for a same-core comparison.

Everything is a table flowing through a pipeline

An Ibex program is a handful of let bindings. Each one names a table and the steps applied to it. There are no loops or mutable variables — you describe the transformation you want and Ibex runs it.

Pipelines

A table, followed by the steps to apply to it

Write a table name, then square brackets containing a comma-separated list of clauses — one operation each. The clauses run in the order you read them, each taking the whole table and producing a new one.

Tables are values: a pipeline returns a new table and leaves its input alone. You can name the result with let, or feed it straight into another set of brackets.

// Keep the busy rows, then three columns
prices[
    filter volume > 1000,
    select { symbol, price, volume }
];

// Name a result and reuse it
let active = prices[filter volume > 1000];
active[select { symbol, price }];

Three kinds of bracket, one job each

Ibex uses square brackets, braces, and parentheses for three distinct things. Knowing which is which is most of what it takes to read any snippet.

[ ] — a pipeline

Square brackets attach to a table and hold a list of clauses to apply: prices[filter …, select …]. Chaining […][…] just feeds one result into the next.

{ } — a list of fields

Braces hold the named members a clause works on — the output columns of select, the sort keys of order, the columns of a schema. Think struct fields, not a code block: { avg = mean(px), n = count() }.

( ) — calls and grouping

Parentheses are the familiar kind: calling a function and grouping arithmetic. mean(price), (close - open) / open. The expressions inside clauses are ordinary too — comparisons, math, function calls.

One pipeline, start to finish

A common task — collapse tick data into daily bars per symbol — shows how the pieces fit together.

Step by step

Group by symbol, reduce each group to one row

select chooses the output columns. Each entry is name = expression; a bare name passes a column through unchanged.

by symbol groups the rows, so the aggregates in selectfirst, max, min, last — run once per symbol. Drop the by and they would collapse the whole table to a single row instead.

order then sorts the result. The whole thing is one expression, bound to bars.

let bars = ticks[
    select {
        open  = first(price),
        high  = max(price),
        low   = min(price),
        close = last(price),
        vol   = sum(size)
    },
    by symbol,
    order symbol
];

The core clauses

These drop inside [ ] and compose in any sensible order. The function reference and cheat sheet have the rest.

filter predicateKeep rows where the predicate is true
select { fields }Choose or compute output columns; aggregates when paired with by
update { fields }Add or replace columns, keeping all existing ones
where predicate update { fields }Replace columns in selected rows
by keyGroup for select / update (like SQL GROUP BY / PARTITION BY)
order { keys }Sort, with per-key asc / desc
rename { map }Relabel columns without touching data
distinct { keys }Deduplicate on one or more columns
head n / tail nKeep the first / last n rows (per group with by)
a join b on keyInner / left / right / outer / semi / anti / cross / as-of joins
window durationLookback window for rolling aggregates on a TimeFrame (e.g. window 5m)
resample durationBucket a TimeFrame into fixed time intervals, then aggregate per bucket (e.g. 1m OHLC)

Outside the pipelines are a few top-level forms: let bindings, import to load a plugin, fn / extern fn for reusable functions and data sources, and Table { … } to build a table from literals.

Get Ibex on your machine

A prebuilt release is the quickest start. Build from source if you want the latest commits or a binary for your own platform.

Option A — download a release

Grab the prebuilt ibex REPL and bundled plugins, unpack, and run — no toolchain required.

github.com/bobjansen/Ibex/releases ↗

# Unpack the archive for your platform, then:
./ibex --plugin-path ./plugins

Option B — build from source

Requirements: CMake 3.26+ and a C++23 compiler such as Clang 17+, GCC 13+, AppleClang, or MSVC 2022. Ninja is recommended on Linux and macOS; CMake's Visual Studio generator works on Windows.

# Linux/macOS with Clang or GCC
cmake -B build-release -G Ninja \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DCMAKE_BUILD_TYPE=Release
cmake --build build-release

# Windows, from a Developer PowerShell
cmake -B build-release -DCMAKE_BUILD_TYPE=Release
cmake --build build-release --config Release

Run your first pipeline

./ibex --plugin-path ./plugins   # from source: ./build-release/tools/ibex --plugin-path ./build-release/tools
import "csv";
let prices = read_csv("prices.csv");

// Five most-traded symbols by total volume
prices[
    select { traded = sum(volume) }, by symbol,
    order { traded desc },
    head 5
];

Handy REPL commands: :load <file.ibex>, :schema <table>, :head <table> [n], :doc <name>, :help.

Keep going

Benchmarks

Interactive timings and memory use against Polars, DuckDB, ClickHouse, DataFusion, pandas, and R.

Comparison

The same query in Ibex, pandas, Polars, and SQL, side by side.

Reference

A guided walk through every clause, with runnable snippets for deeper evaluation.

I/O guide

CSV, Parquet, SQLite via ADBC, and Kafka streaming into live dashboards.

Cheat sheet

One-page syntax and function reference once you know what you are looking for.