Ibex | Function Reference

Quick terms: DataFrame means table, and TimeFrame means a table with a designated time column.

Ibex syntax

How you usually write these

`a join b on key`	Inner join (equivalent to `inner_join(a, b, key)`).
`a left join b on key`	Left join (equivalent to `left_join(a, b, key)`).
`a right join b on key`	Right join (equivalent to `right_join(a, b, key)`).
`a outer join b on key`	Outer join (equivalent to `outer_join(a, b, key)`).
`a semi join b on key`	Semi join (equivalent to `semi_join(a, b, key)`).
`a anti join b on key`	Anti join (equivalent to `anti_join(a, b, key)`).
`df[select { x = sum(col) }, by key]`	Standard aggregate usage.
`tf[window 5m, update { x = rolling_mean(price) }]`	Typical rolling-window usage.
`df[update { x = fill_forward(x) }]`	Typical null-fill usage.
`df[order key]`	Standard ordering syntax (instead of `order(df, key)`).
`import "csv"; read_csv("file.csv")`	Typical file I/O usage.
`df[cov]` / `df[corr]`	Covariance or correlation matrix of numeric columns.
`df[transpose]`	Swap rows and columns (homogeneous column types required).
`matmul(a, b)`	Matrix multiply two DataFrames.

Core

Table and join functions

Core table functions

`as_timeframe(df, "time_col")`	Convert a table to a time-indexed table using `time_col`.
`scalar(df, col)`	Extract one value from a one-row table.
`order(df, key1, ...)`	Return a table ordered by keys. Usually written as `df[order key1, ...]`.
`print(value)`	Print a human-readable value in scripts and REPL sessions.

Join functions

`inner_join(left, right, key1, ...)`	Keep rows with matching keys. Ibex syntax: `df1 join df2 on key1`.
`left_join(left, right, key1, ...)`	Keep all left rows, attach right matches when present. Ibex syntax: `df1 left join df2 on key1`.
`right_join(left, right, key1, ...)`	Keep all right rows, attach left matches when present. Ibex syntax: `df1 right join df2 on key1`.
`outer_join(left, right, key1, ...)`	Keep all rows from both sides. Ibex syntax: `df1 outer join df2 on key1`.
`semi_join(left, right, key1, ...)`	Keep left rows with a right-side match. Ibex syntax: `df1 semi join df2 on key1`.
`anti_join(left, right, key1, ...)`	Keep left rows with no right-side match. Ibex syntax: `df1 anti join df2 on key1`.
`cross_join(left, right)`	Cartesian product. Ibex syntax: `df1 cross join df2`.
`asof_join(left, right, key1, ...)`	Nearest-in-time join for time-indexed tables. Ibex syntax: `tf1 asof join tf2 on key1`.

Analytics

Aggregates and time functions

Aggregate functions

Use these inside select { ... } (often with by), e.g. df[select { x = sum(col) }, by key]. Nulls are skipped; every value-bearing aggregate returns null for a group with no valid observations (count never does).

`sum(col)`	Sum of non-null values.
`mean(col)`	Arithmetic mean of non-null values.
`min(col)`	Minimum non-null value.
`max(col)`	Maximum non-null value.
`count()`	Row count for the current group/window, nulls included.
`count(col)`	Count of the non-null values of `col` — `0`, not null, for a group with none. The difference from `count()` matters after a left join, where an unmatched row is still a row.
`first(col)`	First non-null value in order.
`last(col)`	Last non-null value in order.
`median(col)`	Median of non-null values.
`std(col)`	Sample standard deviation (n-1 denominator).
`ewma(col, alpha)`	Exponentially weighted moving average.
`quantile(col, p)`	p-quantile with linear interpolation.
`skew(col)`	Sample skewness.
`kurtosis(col)`	Sample excess kurtosis.

Window and cumulative functions

Rolling functions are usually used with window on a time-indexed table; cumulative functions also work in plain update/select.

`rolling_count()`	Row count in the active window.
`rolling_sum(col)`	Rolling sum over the active window.
`rolling_mean(col)`	Rolling mean over the active window.
`rolling_min(col)`	Rolling minimum over the active window.
`rolling_max(col)`	Rolling maximum over the active window.
`rolling_median(col)`	Rolling median over the active window.
`rolling_std(col)`	Rolling sample standard deviation.
`rolling_ewma(col, alpha)`	Rolling EWMA within each window.
`rolling_quantile(col, p)`	Rolling quantile within each window.
`rolling_skew(col)`	Rolling sample skewness.
`rolling_kurtosis(col)`	Rolling sample excess kurtosis.
`lag(col, n)`	Shift backward by `n` rows in current row order; `n` may be an `Int64` scalar expression.
`lead(col, n)`	Shift forward by `n` rows in current row order; `n` may be an `Int64` scalar expression.
`Date - Date`	Calendar day delta as `Int64`, useful with `lag`/`lead`.
`cumsum(col)`	Prefix sum, one output per row.
`cumprod(col)`	Prefix product, one output per row.

Transforms

Missing data, sequence, and randomness

Null and sequence functions

Most often used in update { ... }, e.g. df[update { x = fill_forward(x) }].

`fill_null(x, value)`	Replace nulls in `x` with `value` (any expression of the column's type).
`fill_forward(col)`	Fill missing values using the last earlier non-missing value.
`fill_backward(col)`	Fill missing values using the next later non-missing value.
`coalesce(a, b, ...)`	First non-null argument per row; arguments share one type.
`null_if_nan(x)`	Turn `NaN` cells into null; existing nulls stay null.
`null_if_not_finite(x)`	Turn `NaN`/`±Inf` cells into null.
`is_nan(x)`	`Bool`: true for a valid `NaN` cell; null cells stay null.
`rep(x, times=1, each=1, length_out=-1)`	Repeat or cycle a value/column to build a full output column.

Vectorized RNG functions

Used in expressions like df[update { noise = rand_normal(0.0, 1.0) }].

`rand_uniform(low, high)`	Uniform float draws in [low, high).
`rand_normal(mean, stddev)`	Normal-distributed float draws.
`rand_student_t(df)`	Student-t float draws with `df` degrees of freedom.
`rand_gamma(shape, scale)`	Gamma-distributed float draws.
`rand_exponential(lambda)`	Exponential float draws with rate `lambda`.
`rand_bernoulli(p)`	Bernoulli draws as Int (0/1).
`rand_poisson(lambda)`	Poisson draws as Int.
`rand_int(lo, hi)`	Uniform integer draws in [lo, hi].

Scalar

Scalar helpers and casts

Scalar/date functions

Use in expressions, filters, and updates, e.g. df[filter year(ts) = 2025].

`abs(x)`	Absolute value.
`log(x)`	Natural logarithm.
`sqrt(x)`	Square root.
`year(t)`	Year component from Date/Timestamp.
`month(t)`	Month component from Date/Timestamp.
`day(t)`	Day-of-month component from Date/Timestamp.
`hour(t)`	Hour component from Timestamp.
`minute(t)`	Minute component from Timestamp.
`second(t)`	Second component from Timestamp.
`round(x, mode)`	Round Float to Int with mode: `nearest`, `bankers`, `floor`, `ceil`, or `trunc`.
`like(s, pattern)`	`Bool`: SQL-LIKE match over the whole string. See below.
`substring(s, start[, len])`	`String`: 0-based codepoint slice, Polars `str.slice` semantics. See below.

Cast constructors

These are regular call syntax in Ibex, used in expressions and updates.

`Int64(x)`	Explicit cast to 64-bit integer. A `Bool` casts to 0/1, so `sum(Int64(price > vwap))` counts a predicate.
`Int32(x)`	Explicit cast to 32-bit integer.
`Int(x)`	Alias for `Int64(x)`.
`Float64(x)`	Explicit cast to 64-bit float.
`Float32(x)`	Explicit cast to 32-bit float.

Strings

Pattern matching with `like`

like(value, pattern) is SQL-LIKE matching: (String, String) -> Bool. The pattern must match the whole value, so like(s, "green") is equality rather than a substring test. There is no infix LIKE keyword — NOT LIKE is written with the ordinary ! operator.

`%`	Matches zero or more characters.
`_`	Matches exactly one character (one UTF-8 code point, so a multi-byte character counts as one).
`\`	Escapes the next character, so `%` and `_` can be matched literally. Ibex string literals process escapes too, so a literal percent sign is written `like(s, "100\\%")`.

Matching is case-sensitive and locale-independent. Null propagates like it does through any scalar function: a null value or a null pattern gives null, and a null predicate does not keep the row in filter (so !like(...) does not resurrect it either).

`parts[filter like(p_name, "%green%")]`	Substring: `green` anywhere in the name.
`parts[filter like(p_type, "%BRASS")]`	Suffix.
`parts[filter like(p_container, "SM %")]`	Prefix.
`orders[filter !like(o_comment, "%special%requests%")]`	Negated match with two ordered fragments.
`parts[update { is_green = like(p_name, "%green%") }]`	`like` returns `Bool`, so it is also a field expression.

Case-insensitive matching, regular expressions, and collation are out of scope.

`substring`

substring(s, start[, length]) extracts a run of Unicode codepoints: (String, Int[, Int]) -> String. It follows Polars str.slice, not SQL — start is 0-based and may be negative (counted from the end), and length is optional (to the end of the string when omitted).

`substring(c_phone, 0, 2)`	First two characters — the TPC-H Q22 country code.
`substring(s, 3)`	From the 4th character to the end.
`substring(s, -3)`	Last three characters.

Indices clamp rather than error: a start past the end gives "", a length past the end stops at the end, and a zero or negative length gives "". Slicing is by codepoint, never by byte, so a multi-byte character is never split and the result is always valid UTF-8. Null propagates.

Matrix

Matrix operations

These treat a DataFrame as a column-major matrix. Non-numeric columns are silently dropped for cov, corr, and matmul; Int64 columns are widened to Float64. transpose requires all data columns to share the same type.

`df[cov]`	Sample covariance matrix of all numeric columns. Returns an N×N Float64 table with a leading `column: String` label column. Denominator is n−1.
`df[corr]`	Pearson correlation matrix of all numeric columns. Same schema as `cov`; diagonal values are exactly 1.0.
`df[transpose]`	Swap rows and columns. All data columns must share the same type. An optional String or Categorical column is used to name output columns; if absent, columns are named `r0`, `r1`, …
`matmul(a, b)`	Matrix multiply two DataFrames. Inner dimensions must match. Output column names come from `b`; row count equals `nrow(a)`.

Examples

`prices[select { open, high, low, close }][cov]`	4×4 covariance matrix of OHLC columns.
`prices[select { open, close }][corr]`	2×2 correlation matrix; off-diagonal is the open/close correlation.
`prices[select { symbol, open, close }][transpose]`	Transpose with `symbol` values as output column names.
`matmul(returns[select { open, close }], weights)`	Multiply a returns matrix by a weights column — typical portfolio aggregation.

Model

Model specification

The model clause fits a regression using R-style formula syntax. Numeric columns pass through to the design matrix; String columns are dummy-encoded (treatment coding). The result is a ModelResult — an opaque type accessed via the functions below.

`df[model { y ~ x1 + x2 }]`	OLS regression of `y` on `x1` and `x2` with intercept.
`df[model { y ~ . }]`	Regress `y` on all other columns (dot notation).
`df[model { y ~ x - 1 }]`	No intercept — suppress the constant term.
`df[model { y ~ x1 * x2 }]`	Crossing: expands to `x1 + x2 + x1:x2`.
`df[model { y ~ x1 + x2, method = ridge, lambda = 0.1 }]`	Ridge regression with L2 penalty `lambda`.
`df[model { y ~ x, method = wls, weights = w }]`	Weighted least squares using column `w` as weights.

Accessor functions

`model_coef(m)`	Coefficient table with columns `term: String` and `estimate: Float64`.
`model_summary(m)`	Full summary: `term`, `estimate`, `std_error`, `t_stat`, `p_value`.
`model_fitted(m)`	Fitted values (ŷ) as a single-column table.
`model_residuals(m)`	Residuals (y − ŷ) as a single-column table.
`model_r_squared(m)`	R² and adjusted R² as a single-row table.

Examples

`prices[model { close ~ open + volume }]`	Simple OLS of closing price on open and volume.
`prices[filter volume > 1000000, model { close ~ open + high + low }]`	Filtered regression — only fit on high-volume rows.
`prices[model { close ~ open * volume, method = ridge, lambda = 0.5 }]`	Ridge with main effects and interaction term.

I/O Libraries

File and stream functions shipped with Ibex

For a focused CSV / Parquet / SQLite / Kafka guide with examples and usage notes, see the I/O page.

CSV and JSON

Typical workflow: import "csv" or import "json", then call these directly.

`read_csv(path)`	Load a CSV file into a DataFrame.
`write_csv(df, path)`	Write a DataFrame as CSV and return row count.
`read_json(path)`	Load JSON (array, object, or JSON-lines) into a DataFrame.
`write_json(df, path)`	Write a DataFrame to JSON and return row count.

Parquet and stream I/O

Use parquet for local batch files, HTTPS URLs, and S3-compatible object reads; UDP, WebSocket, and Kafka functions are commonly used in Stream { ... } pipelines.

`read_parquet(path)`	Load Apache Parquet from disk, `https://`, or `s3://` into a DataFrame.
`write_parquet(df, path)`	Write a DataFrame to Parquet and return row count.
`kafka_recv(brokers, topic, group, schema[, options])`	Poll one JSON Kafka message, decode it with an explicit schema, and return a one-row DataFrame or `StreamTimeout`.
`kafka_recv_avro(brokers, topic, group, schema, registry_url[, options])`	Poll one Schema-Registry-framed Avro Kafka message, decode it with an explicit schema plus registry URL, and return a one-row DataFrame or `StreamTimeout`.
`kafka_send(df, brokers, topic[, options])`	Serialize each DataFrame row to one JSON Kafka message and return sent-row count.
`udp_recv(port, schema[, options])`	Read JSON datagrams into a DataFrame batch, decoded with an explicit schema.
`udp_send(df, host, port)`	Send a DataFrame batch via UDP and return sent-row count.
`ws_listen(port)`	Start a WebSocket listener.
`ws_recv(port, schema[, options])`	Receive JSON WebSocket messages as one-row DataFrames, decoded with an explicit schema.
`ws_connect(url, schema[, options])`	Connect to an external WebSocket feed (`ws://…`) and receive JSON messages as one-row DataFrames.
`ws_send(df, port)`	Broadcast a DataFrame batch to connected WebSocket clients.

Ibex function reference

How you usually write these

Table and join functions

Core table functions

Join functions

Aggregates and time functions

Aggregate functions

Window and cumulative functions

Missing data, sequence, and randomness

Null and sequence functions

Vectorized RNG functions

Scalar helpers and casts

Scalar/date functions

Cast constructors

Pattern matching with like

substring

Matrix operations

Examples

Model specification

Accessor functions

Examples

File and stream functions shipped with Ibex

CSV and JSON

Parquet and stream I/O

Pattern matching with `like`

`substring`