Reference

Ibex language tour and reference

This page collects the longer language tour, examples, notebook integration notes, I/O details, and advanced syntax that used to live on the Home page. Start at Home if you want the shorter first-run path.

Never the slowest. Often the fastest. On a single core. Across grouped aggregation, rolling time-series, and null-handling at 1M–50M rows, Ibex is competitive with or faster than multithreaded Polars, DuckDB, DataFusion, and ClickHouse. Using only one thread.

Start on Home View on GitHub Benchmarks

Why?

Ibex is focused solely on tabular data

Ibex only knows tables and scalars, this made it possible to create a new language in which it is possible to write concise, performant code easy to read table manipulation.

Quant and market data

OHLC, rolling windows, as-of joins, feature columns

Work directly with columnar time-series pipelines, keep group and window semantics explicit, then compile the same script when latency or deployment starts to matter.

Analytics engineering

DataFrame expressiveness without stringly SQL glue

Keep transformations readable with bracket clauses, strict types, named arguments, imports, and reusable functions instead of sprawling host-language orchestration.

Notebook power users

Move the heavy part into a real language

Bind pandas, Arrow, or R data frames into Ibex, run the expensive joins and aggregations there, and hand compact results back to the notebook for plotting and review.

Features

Interactive, typed, compiled

Explore in a REPL, trust the static types to catch shape and type errors before anything runs, and compile the same script to C++23 when it ships.

Concise bracket syntax

Pipeline operations chain left-to-right inside [ ], in the order they execute. No wrapping every column in a function call, no quoting names. A bare field list defaults to select, so trades[price] and trades[{ symbol, price }] are projection shorthand.

trades[filter price > 1.0,
  select {
    symbol,
    vwap = sum(price * qty)
         / sum(qty)
  }, by symbol]

trades[{ symbol, price }]

TimeFrames

as_timeframe promotes a DataFrame to a time-indexed structure. Rolling windows are convenient and blazingly fast.

let tf = as_timeframe(ticks,
                      "ts");
tf[window 5m, update {
  avg_5m = rolling_mean(px),
  vol_5m = rolling_sum(qty)
}]

Grouped update

update with by computes a value per group and writes it back to every row, so group statistics sit alongside the originals.

prices[update {
  z = (price - mean(price))
      / std(price)
}, by symbol]

C++23 codegen

ibex_compile transpiles any .ibex script to idiomatic C++ using the ibex::ops::* library. Compiled and interpreted outputs are behaviour-equivalent; both run near peak throughput.

Strict type system

No implicit Int ↔ Float coercion. Explicit cast constructors (Int64(x), Float64(x)) and a five-mode round() built-in — including banker’s rounding — cover every numeric conversion. Optional annotations are validated when present.

Python & R notebooks

Drive Ibex from a notebook: the ibex_ipython extension binds pandas or Arrow tables into %%ibex cells and hands results back to pandas, pyarrow, or matplotlib; the ribex package runs Ibex from pure R and returns a data.frame ready for ggplot2. Both keep table bindings alive across cells.

Notebook integration

Run Ibex inside Python or R notebook workflows

Quickly switch between Python and Ibex cells in a notebook. The results from Ibex can seamlessly be used by later cells in the notebook.

Python Bridge

Use Ibex as a DSL inside a real notebook workflow

The ibex_pyarrow bridge evaluates Ibex code and returns a pyarrow.Table. The ibex_ipython extension adds %%ibex, %ibexfile, and %ibexreset, so notebooks can bind pandas or Arrow tables and ordinary Python scalars into Ibex, keep expensive table loads alive across cells, and keep plotting in Python.

Scalar bindings currently support Python int, float, bool, str, datetime.date, and datetime.datetime, with the last two mapped to Ibex Date and Timestamp.

This keeps Ibex focused on joins, aggregations, reshaping, and time-series work, while reusing pandas, matplotlib, and the rest of the notebook ecosystem for presentation.

Build the bridge with cmake --build build-release --parallel --target ibex_pyarrow, then start IPython from the repository root and load the extension.

# IPython / Jupyter
%load_ext ibex_ipython

import pandas as pd

trades = pd.DataFrame({
    "symbol": ["AAPL", "AAPL", "MSFT"],
    "qty": [10, 15, 7],
    "px": [101.2, 101.5, 299.8],
})
offset = 10

%%ibex --bind trades=trades --bind offset=offset --as pandas --out grouped
trades[
    select { total_qty = sum(qty + offset), avg_px = mean(px) },
    by symbol,
    order symbol
];

grouped.plot(kind="bar", x="symbol", y="total_qty");

Persistent Session

Load once, reuse across notebook cells

%%ibex keeps table-valued let bindings in a hidden Ibex session. Large CSV or parquet loads do not need to be repeated in every cell.

%%ibex --quiet
import "csv";
let train = read_csv("../../kaggle/data/train.csv", "<empty>");

%%ibex --as pandas --out bucket_summary
train[select { rows = count() }, by seconds_in_bucket, order seconds_in_bucket];

%ibexreset  # clear the hidden session

R Package

Use Ibex from pure R and hand results straight to ggplot2

The in-repo ribex package evaluates inline Ibex queries or .ibex files from R. Results come back as a data.frame by default, so the common ggplot2 path is immediate. For lower-level interop, format = "nanoarrow" returns the Arrow-backed result.

Like the Python notebook path, ribex supports a persistent Ibex session, so large CSV or parquet loads can happen once and then be reused across later calls.

For literate reports, ribex can also register a real {ibex} knitr engine inside R Markdown, so Ibex chunks and ordinary R / ggplot2 chunks can share one named session.

R input bindings are available by copy for named data.frame tables and named scalar lists. Scalars currently support R integer, double, logical, character, Date, and POSIXct.

# R
library(ribex)
library(ggplot2)

ribex::register_knitr_engines()

sess <- create_session(
  plugin_paths = c("build-release/tools")
)

session_eval(
  sess,
  '
  import "csv";
  let iris = read_csv("data/iris.csv", "");
  '
)

summary <- session_eval(
  sess,
  '
  iris[
      select { mean_sl = mean(Sepal_Length), n = count() },
      by Species,
      order Species
  ];
  '
)

ggplot(summary, aes(Species, mean_sl)) +
  geom_col()

CSV Options

Read delimited files with or without headers

The bundled CSV plugin supports custom delimiters and headerless files. When has_header is false, Ibex synthesizes column names col1, col2, and so on.

SQLite is also available through the optional ADBC plugin, so the same pipeline can start from a SQL query instead of a flat file when you want database-side staging first.

import "csv";

let raw = read_csv("examples/measurements.txt", "", ";", false)
    [select { station = col1, temp = col2 }];

Compile-Time Map

Generate feature families without leaving Ibex

Braced select and update blocks support a compile-time map form. It expands from earlier string-list let bindings into ordinary named fields before lowering, so runtime execution stays unchanged.

Use backtick aliases with ${name} interpolation and get(name) inside the map body to turn compile-time names into column references.

columns(table_expr) returns a one-column metadata table named name, so generated schemas can stay in Ibex instead of being rebuilt in notebook host code.

let measures = ["price", "fee"];

trades[select {
    symbol,
    map m in measures => `avg_${m}` = mean(get(m))
}, by symbol];

let price_cols = ["ask_price", "bid_price", "wap"];

book[update {
    map (i, a) in price_cols, (j, b) in price_cols
        where i > j
        => `${a}_${b}_imb` = (get(a) - get(b)) / (get(a) + get(b))
}];

let generated = columns(book);
generated[filter name != "row_id"];

Language tour

Everything you need, nothing you don’t

A complete analytical pipeline in a handful of composable clauses.

Load & Filter

External functions bring data in

extern fn declares a C++ data-source function as a first-class Ibex binding. The compiler resolves it at link time; the REPL loads the corresponding plugin .so at runtime.

Filter expressions support arithmetic, comparisons, and boolean logic (and, or, not). Multiple clauses chain in reading order.

Output plugins mirror the input API: write_csv, write_json, and write_parquet accept a DataFrame and a path and return the number of rows written.

The dedicated I/O guide covers CSV and Parquet usage in more detail, plus SQLite reads through ADBC and Kafka/Redpanda streaming into WebSocket dashboards for both JSON and Schema-Registry-backed Avro, including headerless files, custom delimiters, schema hints, and end-to-end demos.

extern fn read_csv(
    path: String,
    nulls: String = "",
    delimiter: String = ",",
    has_header: Bool = true
) -> DataFrame
    from "csv.hpp";
extern fn read_json(path: String) -> DataFrame
    from "json.hpp";

let prices = read_csv("prices.csv");
let headerless = read_csv("ticks.csv", has_header = false);
let config = read_json("config.json");

// Keep rows where price exceeds 1.0
let active = prices[filter price > 1.0];

// Chain filter with a column projection
prices[
    filter price > 1.0,
    select { symbol, price, volume }
];

// Write results back to disk
import "csv";
import "json";
import "parquet";

let n = write_csv(active, "out/active.csv");
write_json(active, "out/active.json");
write_parquet(active, "out/active.parquet");

The bundled Parquet plugin also reads directly from public HTTPS URLs, presigned URLs, and S3-compatible object storage with the same read_parquet(path) call: read_parquet("https://data.example.com/prices.parquet") or read_parquet("s3://market-data/prices.parquet?region=us-east-1").

Inline Tables

Build DataFrames from literals or existing column vectors

The Table { … } constructor builds a DataFrame column-by-column. Each column value is either an inline Series literal [v, v, …] or any Ibex expression that produces a series or table — no file or plugin declaration required.

For table-valued expression columns, the runtime extracts the column from the result table: a single-column result is used directly; a multi-column result is matched by column name. Literal and expression columns may be freely mixed.

All columns must have equal row counts. The result is a first-class DataFrame: chain operations, promote to as_timeframe, or join with another table.

// Series literals
let price = [150.0, 140.0, 300.0];
let empty: Series<Float64> = [];

// Series-literal columns
let t = Table {
    symbol = ["AAPL", "GOOG", "MSFT"],
    price  = price,
};

// Columns from existing table expressions
let summary = Table {
    symbol = prices[select { symbol }],
    high   = prices[select { high = max(price) }, by symbol],
    low    = prices[select { low  = min(price) }, by symbol],
};

// Mix literals and expressions
let ref = Table {
    label  = ["open", "close"],
    value  = ohlc[select { value = mean(price) }],
};

// Promote to TimeFrame
let tf = as_timeframe(
    Table { ts = [1000, 2000, 3000], price = [10, 20, 30] },
    "ts"
);

Aggregation

One clause for projection and grouped reduction

select doubles as a projection and an aggregation clause. Add by to group; omit it for a global aggregate.

Aggregate functions: first, last, sum, mean, min, max, count, median, std, ewma. All skip null rows by default. IEEE NaN / Inf are not nulls, so normalize them explicitly before aggregation when needed.

// Mean sepal length and row count, by species
iris[select {
    mean_sl = mean(`Sepal.Length`),
    n       = count()
}, by Species];

// OHLC per symbol — all in one pass
prices[select {
    open   = first(price),
    high   = max(price),
    low    = min(price),
    close  = last(price),
    traded = sum(volume)
}, by symbol];

Grouped Update

Broadcast group aggregates back to every row

update + by evaluates each expression per group and writes the result back to every row in that group — no row reduction, no separate join step. This is the equivalent of a SQL window function with PARTITION BY and no frame clause.

Use it to attach group statistics (mean, z-score, rank) alongside the original columns without losing any rows.

// Attach species mean to every individual row
iris[update { group_mean = mean(`Sepal.Length`) }, by Species];

// Z-score within each symbol group
prices[update {
    z = (price - mean(price)) / std(price)
}, by symbol];

// Combine grouped update with filter — keeps all original rows
prices[
    update { avg = mean(price) }, by symbol,
    filter price > avg
];

Derived Columns

Add or replace columns without losing any data

update appends named computed columns to every row. All existing columns pass through untouched. A tuple left-hand side unpacks a multi-column result in one assignment. The bare update = expr form merges all columns from a table-returning expression.

// Enrich every row with return, range, and notional
ohlcv[update {
    ret      = (close - open) / open,
    range    = (high  - low)  / open,
    notional = close * volume
}];

// Unpack a multi-column result with a tuple LHS
df[update { (delta, gamma) = compute_greeks() }];

// Merge all columns from a table-returning function
prices[update = gen_prices(symbols)];

// Chain an update with an aggregation
let daily = ohlcv[update { ret = (close - open) / open }];
daily[select {
    avg_ret    = mean(ret),
    n_sessions = count()
}, by sector]
    [order { avg_ret desc }];

Rename

Relabel columns without touching the data

rename maps old column names to new ones while keeping every other column intact. Use the single-name shorthand or the braced multi-rename form. Column order is preserved.

Because rename runs before select in canonical order, renamed columns are immediately visible to the rest of the pipeline.

// Single rename — no braces needed
trades[rename p = price];

// Rename multiple columns at once
trades[rename { p = price, q = volume }];

// Rename then select using the new name
trades[
    filter price > 15,
    rename p = price,
    select { p, volume }
];

// Works with update and aggregation too
trades[
    rename { sym = symbol, vol = volume },
    select { total_vol = sum(vol) }, by sym
];

Order, Head, Tail & Distinct

Sort, keep the first or last rows, and deduplicate with explicit intent

order accepts a single column, a multi-key block with asc / desc annotations, or no argument at all to sort by every column in schema order.

head keeps the first n rows in the current order, while tail keeps the last n rows. Both work per group when combined with by.

distinct deduplicates on a single column or a set.

// Sort by a single key (ascending by default)
iris[order `Sepal.Length`];

// Multi-key sort with explicit directions
results[order { avg_ret desc, symbol asc }];

// Keep the first 10 rows in current order
results[head 10];

// Top 3 rows per symbol after sorting
results[order { avg_ret desc }, head 3, by symbol];

// Keep the last 10 rows in current order
results[tail 10];

// Bottom 3 rows per symbol after sorting
results[order { avg_ret desc }, tail 3, by symbol];

// Sort by all columns in schema order
iris[order];

// Unique species names
iris[distinct Species];

// Unique (species, length) pairs
iris[distinct { Species, `Sepal.Length` }];

Filter With Lag

Use previous or next rows directly in predicates

Pure row-preserving functions such as lag and lead can appear directly inside filter. They use the current row order, so add a preceding order block when the input order needs to be explicit. The offset can be an integer literal or scalar Int64 expression.

This keeps shift-style patterns concise without materializing temporary columns first. Date - Date returns an integer day delta, so calendar-adjacent filters can combine lag(date_col, 1) with date arithmetic directly.

// Numbers that appear at least three times consecutively
let n = 1;
logs[
    order id
][
    filter num == lag(num, n) && num == lag(num, n + 1)
][
    distinct { ConsecutiveNums = num }
];

Rank

Assign dense, first, min/max, or average ranks inside update

rank(col) is the shorthand single-key form. For multi-key ranking, use rank(order { ... }) and spell sort direction per key.

The surrounding by clause partitions ranking by group, which makes dense top-k filters much cleaner than distinct + order + head + semi join rewrites.

// Dense descending rank by score within each symbol
scores[
    update { dense_rank = rank(score, method = dense, ascending = false) },
    by symbol
];

// Multi-key ranking with an explicit tie-break
scores[
    update { order_rank = rank(order { score desc, ts asc }, method = first) },
    by symbol
];

// Top 3 dense ranks per department
employee[
    update { salary_rank = rank(salary, method = dense, ascending = false) },
    by departmentId
][filter salary_rank <= 3];

Joins

Inner, left, right, outer, semi, anti, cross, as-of — and non-equijoin predicates

The as-of join attaches the latest right row at-or-before each left timestamp. It is the standard pattern for enriching tick data with bar data, without look-ahead bias.

Join keys are named with on. When the expression after on is a comparison or boolean expression rather than a column name, it becomes a non-equijoin predicate (theta join) evaluated for every pair of rows. Both tables must be TimeFrames for an asof join.

// Inner join — drop non-matching rows
let enriched = daily join fund on symbol;

// Left join — preserve all left rows
let with_meta = prices left join metadata on symbol;

// Right join — preserve all right rows
let with_scores = prices right join scores on symbol;

// Outer join — preserve rows from both sides
let unioned = prices outer join metadata on symbol;

// Semi join — keep left rows that have a match
let matched = prices semi join metadata on symbol;

// Anti join — keep left rows that have no match
let missing = prices anti join metadata on symbol;

// Cross join — cartesian product
let cartesian = prices cross join metadata;

// Non-equijoin — pair rows where a column satisfies an inequality
let events = ticks join windows on ts >= start && ts < end;

// As-of join — each tick gets the latest bar at or before ts
let tf   = as_timeframe(ticks,  "ts");
let bars = as_timeframe(bars_1m, "ts");
tf asof join bars on ts;

Subqueries

Compare a row against an aggregate of the rows that relate to it

Inside a filter, scalar(...) holds a table expression whose single aggregate column is compared against the outer row. outer(column) captures a column from the enclosing query — it is the correlation boundary. A bare name inside the subquery always means the subquery’s own column, so the two sides can share names without ambiguity.

The subquery is decorrelated, never run once per outer row: a correlated one becomes an aggregate grouped by the captured column, left-joined back onto the outer rows; an uncorrelated one becomes a cross join against its single row. When nothing matches an outer row’s key the value is null, and comparing against null keeps no rows — SQL’s scalar-subquery semantics.

// Each part's cheapest supplier — TPC-H Q2's shape.
// The minimum is taken over the supply rows whose ps_partkey
// matches THIS row's p_partkey.
(parts join supply on p_partkey)[
    filter ps_supplycost == scalar(
        supply[
            filter ps_partkey == outer(p_partkey),
            select { m = min(ps_supplycost) }
        ]
    )
];

// Capture nothing and it is UNcorrelated: one value, the
// same for every row, evaluated once and broadcast.
supply[
    filter ps_supplycost > scalar(supply[select { m = mean(ps_supplycost) }])
];

// The subquery is a filter, not a projection: its value
// never becomes a column of the result.

Null Handling

SQL-style three-valued logic, Arrow-style storage

Every column carries an Arrow-compatible validity bitmap alongside its value buffer. Null propagates through arithmetic, comparisons, and every value function — null in, null out: null * x = null, null > 5 = null, abs(null) = null, is_nan(null) = null. The only functions that consume null are the ones built for it: is null / is not null, coalesce, fill_null, and null_if_nan / null_if_not_finite. filter silently drops rows where the predicate is null.

Use is null / is not null to inspect the validity bitmap directly — these predicates always return a valid Bool, never null. Three-valued boolean operators follow SQL rules: true OR null = true, false AND null = false.

Nulls arise from joins with unmatched rows (left join: unmatched right columns; right join: unmatched left columns; outer join: both sides), dcast (missing pivot combinations), and aggregates over groups with no valid observations — every value-bearing aggregate (sum, mean, first, last, …) returns null for an all-null group, plus std on fewer than 2 non-null values, skew on fewer than 3, kurtosis on fewer than 4. Only count never returns null.

Floating-point NaN and Inf values are separate from nulls. Use null_if_nan or null_if_not_finite to convert them into nulls before grouped reductions.

// Left join introduces nulls for unmatched rows
let enriched = employees left join departments on dept_id;

// Test the validity bitmap with IS NULL / IS NOT NULL
enriched[filter { dept_name is null }];
enriched[filter { dept_name is not null }];

// Null propagates through arithmetic
// bonus is null wherever budget is null
enriched[select { name, bonus = salary + budget }];

// 3VL: true OR null = true; keeps the row
enriched[filter { dept_name is not null || salary > 80000 }];

// Aggregates skip null rows automatically
enriched[select { avg_sal = mean(salary) }, by dept_name];

Scalar Functions

Math and date extraction built in

Ibex ships a small set of built-in scalar functions that apply element-wise to a column. Math: abs, log, sqrt, row-wise pmin, row-wise pmax. Date/time extraction: year, month, day, hour, minute, second.

All scalar functions operate on Numeric or Date | Timestamp columns and produce a new column of the return type. Use them inside update, select, or filter expressions.

// Math functions
trades[update {
    log_ret  = log(price / lag(price, 1)),
    vol      = sqrt(variance),
    notional = abs(pnl),
    hi       = pmax(bid, ask, wap),
    lo       = pmin(bid, ask, wap)
}];

// Date / time extraction
trades[update {
    yr = year(date),
    mo = month(date),
    dy = day(date)
}];

trades[update { hr = hour(timestamp) }];

// Use in filter
trades[filter year(date) = 2024];

String Matching

`like` matches a string against a pattern

like(value, pattern) is SQL-LIKE matching — (String, String) -> Bool. The pattern must cover the whole value: % matches zero or more characters, _ matches exactly one code point, and \ escapes either of them. Matching is case-sensitive.

There is no infix LIKE keyword; NOT LIKE is the ordinary ! operator. Because like returns Bool, it works as a field expression as well as a predicate. Null propagates: a null value or pattern yields null, and a null predicate keeps no row.

// Substring, suffix, prefix
parts[filter like(p_name, "%green%")];
parts[filter like(p_type, "%BRASS")];
parts[filter like(p_container, "SM %")];

// NOT LIKE, with two ordered fragments
orders[filter !like(o_comment, "%special%requests%")];

// Bool-valued, so it is also a field
parts[update { is_green = like(p_name, "%green%") }];

// _ is one character; \ escapes a literal % or _
parts[filter like(code, "A_C")];
sales[filter like(label, "100\\%")];

Type System

Explicit casts and rounding for numeric conversions

Ibex never coerces Int ↔ Float implicitly. Use type-name constructors — Int64(x), Float64(x), etc. — for explicit numeric casts. They work on scalars and on entire columns.

Float → Int casts require the value to be a whole number. Use round(x, mode) first when the value may have a fractional part. Five modes are available: nearest (ties away from zero), bankers (ties to even — statistically unbiased), floor, ceil, trunc.

Type annotations on let bindings and function parameters are optional. When present, they are validated: scalar types must match exactly; schema annotations require all declared columns to be present with the correct types.

// Explicit scalar casts
let n: Int64   = Int64(3.0);   // ok — 3.0 is a whole number
let f: Float64 = Float64(42);   // ok — Int → Float always succeeds
// Int64(3.9)  ← runtime error: fractional part

// Round a Float column to Int
prices[update { vol_int = round(volume_f, nearest) }];

// All five rounding modes
round(3.7, nearest)   // → 4  (ties away from zero)
round(2.5, nearest)   // → 3  (tie → away from zero)
round(2.5, bankers)   // → 2  (tie → nearest even)
round(3.5, bankers)   // → 4  (tie → nearest even)
round(3.7, floor)     // → 3  (toward −∞)
round(3.7, ceil)      // → 4  (toward +∞)
round(3.7, trunc)     // → 3  (toward zero)

// Column-level cast after rounding
prices[update { vol_int = Int64(round(volume_f, nearest)) }];

// Annotation validation
let y: Float64 = Float64(42);   // explicit cast satisfies annotation

Vectorized RNG

Generate random columns in a single vectorized pass

Eight built-in rand_* functions each produce one independent draw per row — no row-by-row loop, no user-space scalar overhead. Each thread has its own mt19937_64 seeded from std::random_device, so parallel queries produce independent streams without locking.

Continuous distributions return Float64: rand_uniform, rand_normal, rand_student_t, rand_gamma, rand_exponential. Discrete distributions return Int64: rand_bernoulli, rand_poisson, rand_int.

// Gaussian noise column
df[update { noise = rand_normal(0.0, 1.0) }];

// Uniform weight, biased coin, die roll — all in one pass
df[update {
    w    = rand_uniform(0.0, 1.0),
    flip = rand_bernoulli(0.7),
    die  = rand_int(1, 6)
}];

// Simulate inter-arrival times and Poisson counts
df[update {
    wait   = rand_exponential(2.5),
    events = rand_poisson(4.0)
}];

rep & Boolean Masks

Fill, repeat, and cycle columns with named-argument syntax

rep(x) mirrors R’s rep(): it fills a column by repeating a scalar literal or cycling an existing column. Named arguments — times, each, and length_out — control the exact repetition pattern.

Passing a Bool literal produces a first-class Series<Bool> (a boolean mask), useful as a computed flag or validity indicator alongside numeric columns.

// Constant-fill: all rows set to zero
df[update { zero = rep(0) }];

// All-true boolean mask
df[update { active = rep(true) }];

// Repeat each element of a column twice
df[update { rep2 = rep(price, each=2) }];

// Named args: times, each, length_out
df[update { flag = rep(flag_col, times=50) }];

Cumulative Functions

Running totals and products in a single O(n) pass

cumsum(col) and cumprod(col) compute prefix sums and products over any Int or Float column. Both return the same type as their input and are valid in select and update blocks — DataFrame or TimeFrame — with or without a window clause.

Typical use: cumulative P&L, compounded returns, or an index-rebased price series.

// Valid in select (produces only the cumulative column)
df[select { cum_pnl = cumsum(pnl) }];

// Valid in update (adds column alongside existing ones)
df[update { cum_pnl = cumsum(pnl) }];

// Compounded return: product of (1 + daily_ret)
df[update {
    growth = cumprod(1.0 + daily_ret)
}];

// Also valid on a TimeFrame (no window clause needed)
tf[update { running_vol = cumsum(volume) }];

Null / NaN Cleanup

Clean missing and dirty float values before aggregation

Three built-in functions cover data.table’s nafill semantics in a single O(n) pass:

fill_null(col, value) — constant fill
fill_forward(col) — LOCF (last observation carried forward)
fill_backward(col) — NOCB (next observation carried backward)

All three work on any column type (Int, Float, String, …) in select or update blocks. Unfillable leading (LOCF) or trailing (NOCB) nulls remain null.

For dirty Float64 columns, Ibex also provides is_nan(x), null_if_nan(x), and null_if_not_finite(x). These are the right way to make grouped mean / sum ignore NaN and infinities. All of these (and coalesce) are ordinary row-wise functions: they take any expression, compose inside larger expressions (fill_null(price, 0) * qty), and follow the null rule — is_nan(null) is null, not false (“NaN or missing?” is is_nan(x) || x is null).

// Constant fill: replace null prices with 0
df[update { price = fill_null(price, 0) }];

// LOCF: carry last known price into gaps
df[update { price = fill_forward(price) }];

// NOCB: fill backwards from next available value
df[update { price = fill_backward(price) }];

// Chain: LOCF then fill remaining leading nulls with 0
df[update { price = fill_forward(price) }]
  [update { price = fill_null(price, 0) }];

// Float cleanup: turn NaN / Inf into null before aggregation
quotes[update {
    px_bad   = is_nan(px),
    px_clean = null_if_not_finite(px)
}]
[select { avg_px = mean(px_clean) }, by symbol];

// Direct aggregate expressions are also allowed
quotes[select { avg_notional = mean(price * volume) }, by symbol];
quotes[select { avg_px = mean(null_if_nan(px)) }, by symbol];

Rolling Windows

Time-based rolling aggregates in one O(n) pass

as_timeframe validates sort order in O(n) and records the time-index column. window <dur> sets the lookback; rolling functions (rolling_sum, rolling_mean, rolling_count, lag) use a two-pointer scan with no per-row heap allocation.

Duration literals: 1s, 30s, 1m, 5m, 1h, …

let tf = as_timeframe(ticks, "ts");

// Previous tick's price
tf[update { prev_price = lag(price, 1) }];

// Tick count in the last 60 seconds
tf[window 1m, update { ticks_1m = rolling_count() }];

// Multiple rolling aggregates in one pass
tf[window 5m, update {
    sum_5m  = rolling_sum(price),
    mean_5m = rolling_mean(price)
}];

Resample

Aggregate ticks into equal-width time buckets

resample <dur> floors timestamps into fixed-width intervals and reduces each bucket to one output row. Combine with by for per-symbol bars.

The output TimeFrame carries the bucket start time as its time index, ready for downstream joins or further resampling.

let tf = as_timeframe(ticks, "ts");

// 1-minute OHLC bars
let bars = tf[resample 1m, select {
    open  = first(price),
    high  = max(price),
    low   = min(price),
    close = last(price)
}];

// Per-symbol 1-minute bars
tf[resample 1m, select {
    open  = first(price),
    close = last(price)
}, by symbol];

// Enrich ticks with the latest bar's close
tf asof join bars on ts;

Reshape

Pivot and unpivot with melt & dcast

melt converts a wide DataFrame to long format: id columns stay fixed while measure columns are unpivoted into variable / value rows. Combine with select to choose specific measure columns.

dcast is the inverse — it pivots a long table back to wide format. Distinct values of the pivot column become new column names. Combine with by for row keys and select for the value column. Missing cells are filled with null.

// Wide → long: unpivot all columns except symbol
ohlc[melt symbol];
// → symbol | variable | value
//   AAPL   | open     | 150.0
//   AAPL   | high     | 155.0
//   ...

// Only unpivot specific columns
ohlc[melt symbol, select { open, close }];

// Long → wide: pivot back
long[dcast variable, select value, by symbol];
// → symbol | open  | high  | low   | close

// Roundtrip: melt then dcast recovers original data
let long  = wide[melt symbol];
let wide2 = long[dcast variable, select value, by symbol];

Scalar & Codegen

Extract single values and compile to C++

scalar pulls one typed value out of a single-cell result table. It is available as a binding in subsequent expressions.

ibex_compile transpiles a .ibex file to a self-contained C++ source file. The helper script compiles and links it in one step.

// Pull a single value from an aggregate
let total = scalar(
    prices[select { total = sum(price) }],
    total
);

// Use it in subsequent expressions
prices[update { weight = price / total }];

# Compile and run in one step
scripts/ibex-run.sh examples/quant.ibex

Stream

Continuous pipelines with no new syntax

Stream { } wires a source extern, an anonymous transform block, and a sink extern into a continuous event loop. The transform is written in plain Ibex — the same clauses used for batch queries.

When the transform contains resample, the runtime automatically switches to TimeBucket mode: it buffers rows and emits one output batch per closed time bucket. Otherwise it operates PerRow, forwarding each incoming batch immediately.

In TimeBucket mode the bucket is flushed by whichever trigger fires first: a wall-clock check (the configured duration has elapsed since the bucket opened) or a data-timestamp check (an incoming row carries a later bucket's timestamp). The wall-clock trigger gives prompt delivery for real-time feeds; the data-timestamp trigger handles historical replay correctly.

Source timeout contract: the wall-clock check runs only when the source call returns. Sources should use a short internal timeout and return StreamTimeout{} when no data arrives — the runtime fires the flush check and immediately calls the source again. Ibex does not buffer data on behalf of the source; whether messages arriving during this window are preserved depends on the transport. For OS kernel-backed sockets (UDP, TCP) the kernel's receive buffer holds packets safely. For user-space transports use StreamBuffered (see below).

Use import to load a library stub that registers the extern source and sink functions as a plugin.

// Load the UDP plugin (registers udp_recv / udp_send)
import "udp";

// Receive ticks, resample to 1-minute OHLC bars, forward over UDP
let ohlc_stream = Stream {
    source    = udp_recv(9001, "ts:ts,symbol:str,price:f64,volume:i64"),
    transform = [resample 1m, select {
        open  = first(price),
        high  = max(price),
        low   = min(price),
        close = last(price)
    }],
    sink = udp_send("127.0.0.1", 9002)
};

// PerRow stream: forward filtered rows with no buffering
let live_filter = Stream {
    source    = udp_recv(9001, "ts:ts,symbol:str,price:f64,volume:i64"),
    transform = [filter price > 100.0, rename p = price],
    sink      = udp_send("127.0.0.1", 9003)
};

// WebSocket sink: push OHLC bars to browser dashboards
import "websocket";

let ws_stream = Stream {
    source    = udp_recv(9001, "ts:ts,symbol:str,price:f64,volume:i64"),
    transform = [resample 1m, select {
        open  = first(price),
        high  = max(price),
        low   = min(price),
        close = last(price)
    }],
    // ws_send starts a TCP listener; browsers connect with new WebSocket("ws://…")
    sink = ws_send(8080)
};

StreamBuffered

A ready-made producer queue for in-process sources

For in-process or user-space transports the plugin needs a thread-safe queue between its producer and the Ibex event loop. StreamBuffered provides one out of the box — a lockless SPSC ring buffer paired with a compatible ExternFn — so the plugin author doesn't have to implement their own.

Use make_buffered_source(producer_fn) to let the Ibex query control the capacity: the plugin only supplies the data-production logic; the ring is lazily initialised on the first event-loop call using the capacity argument from the query. A producer thread is started at that point, calls buf.write(table) (blocking on backpressure), and signals completion with buf.close(). The event loop drains the ring, returning StreamTimeout{} when empty so wall-clock bucket flushes still fire on schedule.

Cache-line-separated std::atomic head and tail indices keep producer and consumer on independent cache lines — no false sharing, no mutex on the hot path.

When to use each approach: for UDP/TCP sockets the OS kernel already manages a receive buffer (SO_RCVBUF) independently of the application — no application-level queue is needed. Return StreamTimeout{} directly from recvfrom with a short socket timeout instead. StreamBuffered is for in-process queues, shared-memory channels, or any transport without a kernel-managed receive buffer.

// C++ plugin — only the data-production logic
#include <ibex/runtime/stream_buffered.hpp>

registry.register_table("my_src",
    ibex::runtime::make_buffered_source([](ibex::runtime::StreamBuffered& buf) {
        for (auto& batch : my_data_source) {
            buf.write(batch);  // yields if ring full
        }
        buf.close();
    }));

// Ibex query — capacity is a first-class tuning parameter
extern fn my_src(capacity: Int) -> TimeFrame from "plugin.hpp";
extern fn my_sink(df: DataFrame) -> Int from "plugin.hpp";

Stream {
    source    = my_src(512),
    transform = [resample 1s, select { close = last(price) }],
    sink      = my_sink()
};

Model Specification

R-style formula syntax for regression, built into the query bracket

The model clause lets you fit OLS, ridge, WLS, or the plugin-backed lightgbm method regressions with a familiar y ~ x1 + x2 formula. Ibex builds the design matrix automatically — numeric columns pass through, categorical columns are dummy-encoded, and interaction terms (:) and crossing (*) expand just like in R.

Results are accessed via S3-style accessor functions: model_coef, model_summary, model_fitted, model_residuals, model_importance, and model_r_squared. The method = parameter is optional and defaults to ols; it also supports built-ins (ols, ridge, wls) plus plugin methods registered by name. The bundled lightgbm plugin links real Microsoft LightGBM and trains gradient-boosted trees — use model_fitted for in-sample predictions, model_importance for per-feature gains, and model_predict(m, newdata) to score new data by reusing the trained model.

// OLS with filter — all in one bracket
let m = prices[
    filter volume > 1000000,
    model { close ~ open + high + low }
];

// Accessor functions for detailed results
let coefs = model_coef(m);
let stats = model_summary(m);
let r2    = model_r_squared(m);

// Ridge regression with L2 penalty
let ridge = prices[model {
    close ~ open + volume,
    method = ridge,
    lambda = 0.1
}];

// Real LightGBM gradient-boosted trees (load plugin first)
import "lightgbm";
let boosted = prices[model {
    close ~ open + high + low + volume,
    method = lightgbm,
    iterations = 300,
    learning_rate = 0.03
}];
let preds = model_fitted(boosted);      // in-sample predictions
let gains = model_importance(boosted);  // term | gain
let scored = model_predict(boosted, new_prices);  // score new data

Performance

Single-core speed that stays visible at scale

Latest generated benchmark page, 2026-06-02, on r7i.2xlarge Sapphire Rapids hardware. Ibex is single-threaded; Polars, DuckDB, DataFusion, and ClickHouse use all 8 vCPUs where their engines do so by default. The full interactive table is on the benchmarks page.

Aggregation — 16 M rows

Query	Ibex	Polars	Polars ST	DuckDB	DataFusion
mean by symbol	33.6 ms	60.5 ms	216 ms	70.3 ms	39.3 ms
count by sym×day	30.6 ms	220 ms	660 ms	77.6 ms	58.1 ms
mean by sym×day	39.9 ms	257 ms	830 ms	80.0 ms	61.6 ms
OHLC by sym×day	71.0 ms	282 ms	932 ms	114 ms	186 ms
sum by user (100K str)	117 ms	305 ms	970 ms	180 ms	118 ms

Ibex wins the common low-cardinality and time-bucketed aggregation cases on one core. DataFusion and ClickHouse still win selected wide or highly parallel cells in the full suite.

TimeFrame — 16 M rows, 1 s spacing

Operation	Ibex	Polars	Polars ST	DuckDB	ClickHouse
rolling count 1m	42.0 ms	71.2 ms	205 ms	582 ms	572 ms
rolling sum 1m	49.7 ms	152 ms	283 ms	1.37 s	882 ms
rolling mean 5m	49.8 ms	163 ms	292 ms	1.61 s	1.31 s
rolling median 1m	1.36 s	3.29 s	3.41 s	6.19 s	11.05 s
rolling std 1m	239 ms	329 ms	455 ms	1.18 s	1.39 s
resample 1m OHLC	101 ms	104 ms	126 ms	280 ms	79.2 ms

Rolling operations use a two-pointer O(n) scan with a single result-column allocation. ClickHouse currently wins the resample row; Ibex remains close to Polars on that query and leads the rolling rows shown here.

Editor support

Syntax highlighting for VS Code

A TextMate grammar covering keywords, types, built-in functions, duration literals, backtick-quoted column names, and comments.

Install — WSL

cp -r editors/vscode \
  /mnt/c/Users/<username>/.vscode/extensions/ibex-language-0.1.0

Install — macOS / native Linux

cp -r editors/vscode \
  ~/.vscode/extensions/ibex-language-0.1.0

Fully restart VS Code after copying. .ibex files are highlighted automatically.