Methodology & code

How the benchmark works

Everything behind the benchmark numbers is here: how to run the full suite yourself, and the exact code each engine runs for every query. The code is extracted directly from the harness source.

Reproduce it

Run it yourself

Every engine is a stock install — polars, duckdb, datafusion, chdb (ClickHouse) and pandas from PyPI; data.table and dplyr from CRAN. The runner is one script; it generates the synthetic data, runs all engines and writes a single CSV.

# clone, build Ibex in release, then run the whole suite locally:
benchmarking/run_scale_suite.sh --warmup 1 --iters 3
# -> benchmarking/results/scales.csv

# render these pages from that CSV:
python3 benchmarking/gen_website.py benchmarking/results/scales.csv

The published numbers come from a clean cloud box for isolation — an AWS r7i.2xlarge (8 vCPU Sapphire Rapids, 64 GB), one command end-to-end:

./benchmarking/aws/run.sh --on-demand   # provisions, runs 1M–50M, uploads, shuts down

Know a faster way to write one of these queries? Open a PR against the benchmark harness and the numbers get re-run and updated. Improvements to any engine's queries are welcome, the aim is an accurate comparison.

Transparency

Exactly what each engine runs

Pick a query. Each engine's code is verbatim from the file linked beside it; rolling-window frames are shown fully resolved (e.g. the RANGE BETWEEN INTERVAL vs ROWS clause) so the time-window comparison is auditable. polars-st runs identical code to Polars with POLARS_MAX_THREADS=1; ibex+parse is the same Ibex query timed with parsing included.