OrionBelt Semantic QL (OBSQL)¶

Semantic QL is OrionBelt's natural SQL surface — BI-style SQL written against a per-model virtual table and translated to a QueryObject that flows through the standard compilation pipeline. The short brand form is OBSQL; both terms refer to the same language. It's the third member of the OrionBelt trio:

Short form	Full name	What it is
OBSL	OrionBelt Semantic Layer	The system
OBML	OrionBelt Modeling Language	YAML format for defining models
OBSQL	OrionBelt Semantic QL	SQL surface for querying them

Semantic QL is the same SQL dialect on two transports:

REST: POST /v1/sessions/{id}/query/semantic-ql (and the top-level shortcut POST /v1/query/semantic-ql in single-model mode)
Arrow Flight SQL: any plain SQL the Flight server receives is classified by FROM target. Targets that match the model's virtual table are translated through the same path. See the Flight server section below.

Why a virtual table?¶

BI tools (Tableau, Power BI, Metabase, Superset, DBeaver) compose SQL from a catalog tree and a column picker. They cannot author OBML YAML. Exposing the semantic model as one virtual table per model, with columns = dimensions + measures + metrics, gives those tools exactly the SQL surface they expect — without leaking warehouse joins to the user.

Semantic QL is OBSL's take on a well-trodden pattern — Cube SQL API, dbt Semantic Layer JDBC, AtScale, and Snowflake SEMANTIC_VIEW(...) all expose semantic models as virtual SQL surfaces. The differentiator: Semantic QL ships with explicit measure markers (MEASURE("X")), aggregate-wrap matching (SUM(sum_measure) validated against the declared aggregation), and first-class hierarchical subtotals (WITH ROLLUP / WITH CUBE) — see the corresponding sections below.

Accepted shape¶

SELECT <dimension or measure labels>
[FROM  <model_name>]
[WHERE <predicates>]
[HAVING <predicates>]
[WITH ROLLUP | WITH CUBE]
[ORDER BY <label or position> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, ...]]
[LIMIT <n>]
[OFFSET <n>]

FROM is optional on a connection where the model is implicit (single-model mode, or multi-model with the database selector set). SELECT "dim", "measure" means the same as SELECT "dim", "measure" FROM <model> — see No-FROM mode below.

WITH ROLLUP / WITH CUBE is accepted either trailing or in its spec-correct slot before ORDER BY / LIMIT.

Clause	Rules
`SELECT`	Bare identifiers, `MEASURE("<label>")`, or an aggregate wrapper that matches the measure's declared aggregation (`SUM("X")` on a SUM-measure, `COUNT(DISTINCT "X")` on a count_distinct-measure, etc.). Mismatched wraps (`MIN` on a SUM-measure) and any wrap on a metric reject with a message naming the declared aggregation. `SELECT *` and free-form expressions are rejected.
`FROM`	The model's virtual table, or omitted entirely. Any other target is rejected with `RAW_SQL_REJECTED` — OBSL is closed by design, there is no env flag to bypass.
`WHERE`	`column op literal` atoms joined by `AND`. Measure / metric references are auto-routed to `HAVING`. Top-level `OR` is rejected.
`HAVING`	Same shape as `WHERE`; passes through unchanged.
`GROUP BY`	Silently ignored — implicit from the dimensions in `SELECT`. BI tools auto-emit it; we tolerate it for compatibility.
`ORDER BY`	Identifier (must be a SELECT alias) or 1-based position. Optional `ASC`/`DESC` and `NULLS FIRST`/`NULLS LAST`. Without the null position, dialect defaults apply (Postgres / Snowflake / DuckDB / Dremio put NULLs last on ASC; MySQL / ClickHouse / BigQuery / Databricks the opposite — set it explicitly for portable behavior).
`LIMIT`	Integer literal.
`OFFSET`	Integer literal. Useful for keyset / page pagination after `ORDER BY`.
`WITH ROLLUP` / `WITH CUBE`	Trailing modifier (or spec-position before `ORDER BY`) — see below.

Selecting measures: three accepted forms¶

The translator accepts three syntactically distinct ways to reference a measure or metric in SELECT:

-- 1. Bare label — terse, recommended for hand-written SQL
SELECT "Region", "Total Sales" FROM sales_model

-- 2. MEASURE() marker — Snowflake SEMANTIC_VIEW / Databricks metric-view syntax
SELECT "Region", MEASURE("Total Sales") FROM sales_model

-- 3. Aggregate wrapper that matches the measure's declared aggregation
SELECT "Region", SUM("Total Sales") FROM sales_model

All three compile to identical vendor SQL — the wrapper is stripped at translation time.

Rule for aggregate wrappers¶

For measures, the wrapping aggregate must equal the measure's declared aggregation. Mismatches reject with a clear error:

Measure declares	Accepted wrap	Rejected example
`sum`	`SUM("X")`	`MIN("X")`, `AVG("X")`
`count`	`COUNT("X")`	`SUM("X")`
`count_distinct`	`COUNT(DISTINCT "X")`	`COUNT("X")`
`avg`	`AVG("X")`	other
`min` / `max`	matching `MIN("X")` / `MAX("X")`	crossing them

Error message names the declared aggregation so the caller can fix it:

[UNSUPPORTED_SQL_FEATURE] Measure Order Count is declared as COUNT — applying SUM would change its math. Use COUNT("Order Count"), bare "Order Count", or MEASURE("Order Count").

Wrappers on metrics¶

Metrics reject every aggregate wrapper — a derived expression like Revenue per Order = SUM(amount) / COUNT(order_id) is already evaluated at the query's grain, and no outer aggregate is mathematically correct. Use bare "Revenue per Order" or MEASURE("Revenue per Order").

[UNSUPPORTED_SQL_FEATURE] Metric Revenue per Order is a derived expression already evaluated at the query's grain — applying SUM(...) would change its math. Use bare "Revenue per Order" or MEASURE("Revenue per Order").

Why this matters¶

BI tools (Tableau, Power BI, Metabase, Superset) emit SUM(measure_col) or COUNT(measure_col) reflexively when you drop a measure on a viz. For SUM-typed measures this now works seamlessly. For metrics, the user sees a concrete error pointing at the right syntax instead of silently getting wrong numbers (e.g., the classic "sum of per-region ratios" trap). Honesty over convenience — the semantic layer exists to make the math right.

Multi-model addressing (v2.4.0+)¶

Start the server with multiple OBML files pre-loaded:

export MODEL_FILES=examples/sales.yaml,examples/returns.yaml,examples/finance.yaml
uv run orionbelt-api

Each model becomes its own Flight SQL catalog. BI tools pick one via the Database field in their connection dialog:

DBeaver:
  Connection → Database: sales

Tableau:
  Same field on the Arrow Flight JDBC connector

psql (pgwire, v2.5.0+):
  postgresql://obsl:KEY@host:5432/sales

pyarrow programmatic:

import pyarrow.flight as flight
options = flight.FlightCallOptions(headers=[(b"database", b"sales")])
info = client.get_flight_info(
    flight.FlightDescriptor.for_command(b'SELECT "Region", "Total Sales" FROM sales'),
    options=options,
)

Model naming rules¶

Each OBML defines its addressing key via a top-level name: field; if unset, the filename stem is used. Both go through identical normalization:

1. lowercase
2. replace runs of [space, dot, dash] with underscore
3. collapse underscore runs
4. strip leading/trailing underscores
5. strip a trailing `_obml` suffix
6. validate against ^[a-z][a-z0-9_]{0,62}$

Examples: My Sales Model → my_sales_model, commerce.v2 → commerce_v2, sales.obml.yaml (filename) → sales.

Reserved names (rejected at startup): obsl, obml, obsql, model, default, public, information_schema, pg_catalog, sqlite_master, mysql, sys, admin, root.

Name collisions across MODEL_FILES entries error at startup with a clear message — no silent shadowing.

Discovery¶

GET /v1/models lists every loaded model:

curl http://localhost:8000/v1/models
{
  "models": [
    {"name": "sales",   "description": "...", "dimensions": 12, "measures": 8, ...},
    {"name": "returns", "description": "...", "dimensions": 9,  "measures": 5, ...}
  ],
  "count": 2
}

Auto-resolve¶

If exactly one model is loaded (single-model mode or MODEL_FILES with one entry), no selector is needed — every connection targets it implicitly.

What happens without a selector when ambiguous¶

[NO_MODEL_SELECTED] Multiple models are loaded and no selector was sent
on this connection. Pick one by setting the connection's `database`
field (or `x-obsl-model` header). Available models: returns, sales.

  DBeaver:    Connection → Database field = <name>
  Tableau:    Same field on the Arrow Flight JDBC connector
  pyarrow:    options = flight.FlightCallOptions(
                  headers=[(b'database', b'<name>')])
  REST:       Use /v1/sessions/<name>/query/semantic-ql

Discover available models via GET /v1/models.

The error itself is the documentation — no need to remember anything.

No-FROM mode — implicit model¶

On a connection with a resolved model (single-model, or multi-model with the database selector set), FROM is optional. These two queries are equivalent:

SELECT "Customer Country", "Total Revenue"
SELECT "Customer Country", "Total Revenue" FROM sales_model

Classification routes a no-FROM SELECT to semantic mode when every column identifier matches a dim / measure / metric on the resolved model. Unknown identifiers reject with RAW_SQL_REJECTED — not UNKNOWN_SELECT_ITEM — so users get a clear "this isn't a query against the model" signal rather than a translator error.

SELECT 1, SELECT version(), and other canned scalar probes keep their catalog routing (BI tools rely on these for connectivity checks).

Smoke-test from the terminal¶

The repo ships examples/obsql.py — a ~150-line pyarrow.flight CLI for running OBSQL without a BI tool. Naming follows the psql / snowsql convention (the CLI shares its name with the language it runs).

# Single-model or auto-resolve
uv run python examples/obsql.py 'SELECT version()'
uv run python examples/obsql.py 'SHOW TABLES'
uv run python examples/obsql.py 'SELECT "Region", "Total Sales" LIMIT 5'

# Multi-model — pick with -m / --model
uv run python examples/obsql.py -m sales 'SHOW TABLES'

# Discover loaded models via REST /v1/models
uv run python examples/obsql.py --list --rest-port 8000

# Verify governance
uv run python examples/obsql.py 'DROP TABLE foo'       # → WRITE_OPERATION_REJECTED
uv run python examples/obsql.py 'SELECT * FROM customers'  # → RAW_SQL_REJECTED

The CLI sets the database gRPC metadata header from -m (matching what the Arrow Flight SQL JDBC driver sends when DBeaver's Database field is set), so it exercises the exact same routing path BI tools take.

Raw mode — detail rows via qualified columns¶

Sometimes you need un-aggregated rows from a data object — the OBML "raw mode" shape. Trigger it by writing qualified "DataObject"."column" references in SELECT:

SELECT "Customers"."Customer Name", "Customers"."Country"
FROM   sales_model
WHERE  "Customers"."Country" = 'US'
ORDER  BY "Customers"."Country"
LIMIT  100

Compiles to a QueryObject with select.fields=[...] (no aggregation, no joins). Detection rule:

Every SELECT item must be "<DataObject>"."<column>"
WHERE predicates target qualified columns the same way
DISTINCT is honoured (SELECT DISTINCT "Customers"."Country" FROM ...)
HAVING, GROUP BY, and WITH ROLLUP are rejected in raw mode
Mixing a qualified raw column with a bare dim/measure → MIXED_RAW_AND_AGGREGATE_MODE

Raw mode is the OBSQL equivalent of REST /query/execute with select.fields=[...]. It bypasses the dim/measure abstraction but stays inside the semantic layer — model-defined row-level filters still apply, no joins, no warehouse-side ad-hoc SQL.

Rejected SQL¶

The translator rejects shapes that don't fit the semantic model with stable error codes:

Code	Triggered by
`UNKNOWN_SELECT_ITEM`	SELECT item that's not a known dim / measure / metric.
`UNKNOWN_FILTER_FIELD`	WHERE / HAVING field that's not a known dim / measure / metric.
`UNKNOWN_ORDER_BY_FIELD`	ORDER BY identifier missing from SELECT.
`INVALID_ORDER_BY_POSITION`	Numeric position outside `[1, n]`.
`UNSUPPORTED_SQL_FEATURE`	JOIN, CTE, subquery, UNION, window function, `SELECT *`, aggregate call wrapped around a measure, top-level `OR`.
`RAW_SQL_REJECTED`	Flight: FROM target is not the virtual table and not a catalog source. Raw warehouse SQL is never accepted (no flag to bypass).
`WRITE_OPERATION_REJECTED`	Flight or REST: `INSERT` / `UPDATE` / `DELETE` / `DROP` / `CREATE` / `ALTER` / `TRUNCATE` / `MERGE` / `GRANT` / `REVOKE` / `COMMIT` / `ROLLBACK`. OBSL is read-only.
`MIXED_RAW_AND_AGGREGATE_MODE`	SELECT mixes qualified raw-mode columns (`"DataObject"."column"`) with bare dim/measure labels. Use one form consistently.

Hierarchical subtotals: `WITH ROLLUP` / `WITH CUBE`¶

Add a trailing modifier to compute hierarchical subtotals (ROLLUP) or the full cross-tab (CUBE):

SELECT "Region", "Country", "Total Sales"
FROM   sales_model
WHERE  "Year" = 2025
WITH ROLLUP

The compiler emits the dialect-appropriate form:

Dialect	Emitted SQL
Postgres, Snowflake, Databricks, DuckDB, Dremio, BigQuery, MySQL	`GROUP BY ROLLUP(a, b)` / `GROUP BY CUBE(a, b)`
ClickHouse	`GROUP BY a, b WITH ROLLUP` / `GROUP BY a, b WITH CUBE`

For every selected dimension, a GROUPING(dim) AS _g_<dim> column is appended to the result schema. 0 means the row carries a real value for that dimension; 1 means it was rolled up (NULL value in the dim column). The flag columns are the only reliable way to tell a subtotal row from a detail row whose dim is legitimately NULL.

-- Keep only the country-level subtotals
WHERE _g_Region = 0 AND _g_Country = 1

Measure additivity under rollup¶

OBSL doesn't classify measures by additivity or rewrite their SQL. The database recomputes each grouping level from base rows:

Additive measures (SUM, COUNT): subtotals sum to higher levels and to the grand total.
Non-additive measures (COUNT(DISTINCT), AVG, percentiles, ratios defined as AVG(x/y)): each grouping level is individually correct, but subtotals do not sum to the grand total. Mathematically expected; not a bug.
Weighted ratios (SUM(x) / SUM(y)): roll up correctly at every level.

Restrictions¶

WITH ROLLUP / WITH CUBE require at least one dimension in SELECT.
The two are mutually exclusive.
Combining rollup/cube with total: true measures, period-over-period metrics, or cumulative metrics emits an INCOMPATIBLE_COMBINATION warning. The query still runs but the _g_* flag columns may not appear in the final projection.

REST examples¶

Execute¶

curl -X POST http://localhost:8000/v1/query/semantic-ql \
  -H 'content-type: application/json' \
  -d '{
    "sql": "SELECT \"Region\", \"Total Sales\" FROM sales_model WITH ROLLUP",
    "dialect": "duckdb"
  }'

The response shape matches /v1/query/execute: sql, dialect, columns, rows, explain, plus the freshness-cache metadata.

Compile-only (debugging)¶

curl -X POST http://localhost:8000/v1/query/semantic-ql/compile \
  -H 'content-type: application/json' \
  -d '{ "sql": "SELECT \"Region\", \"Total Sales\" FROM sales_model" }'

The compile response also includes the translated QueryObject JSON under query, so you can see exactly what your SQL became.

Arrow Flight SQL¶

The Flight server classifies every incoming SQL by its first FROM target:

FROM target	Mode	Behavior
`<model_name>` (virtual table)	`semantic`	Translated → compiled → executed
`SHOW TABLES`, `DESCRIBE`, `information_schema.`, `pg_catalog.`, `SELECT version()` / `current_schema()` / `SELECT 1`	`catalog`	Answered from the model; never touches the warehouse
anything else	`rejected`	`RAW_SQL_REJECTED` — no flag to bypass
DDL/DML (`INSERT`, `UPDATE`, `DELETE`, `DROP`, `CREATE`, `ALTER`, `TRUNCATE`, …)	`rejected`	`WRITE_OPERATION_REJECTED` — OBSL is read-only by design

Governance — closed by design¶

There are no env flags to enable raw SQL or write operations. OBSL is a semantic layer, not a JDBC proxy. The only thing the warehouse ever receives is SQL produced by the OBSL compiler:

Source	What reaches the warehouse
Semantic QL via Flight	Compiled SQL from `CompilationPipeline`
`QueryObject` via REST `/query/execute`	Compiled SQL from `CompilationPipeline`
OBML YAML via Flight	Compiled SQL from `CompilationPipeline`
Catalog discovery (SHOW / DESCRIBE / information_schema / pg_catalog)	Nothing — answered from the model in-process
Anything else	Nothing — rejected at the door

Operators cannot accidentally open a hole. There is no "raw SQL" mode, no admin override, no escape hatch. If you need direct warehouse access, use the warehouse's own clients — not OBSL.

The Flight catalog (CommandGetTables, CommandGetColumns, ListFlights) lists the semantic virtual table first with its dimension / measure / metric columns, plus the _dimensions, _measures, and _metrics metadata views. Data-object physical columns are not exposed.

Schema probe shortcut¶

When a Flight client calls GetFlightInfo for a semantic-mode query, the result schema is built directly from the model — no warehouse round-trip is needed to learn the column types. Faster catalog navigation; no spurious EXPLAIN-shaped queries hit the database.

BI tool setup¶

The general recipe across BI tools that speak Flight SQL:

Install the Apache Arrow Flight SQL JDBC .jar.
Connect to jdbc:arrow-flight-sql://<host>:8815?useEncryption=false.
Browse the schema — you'll see one table per loaded model, with columns labelled by dimension / measure / metric.
In the SQL editor, write Semantic QL against that virtual table:

SELECT "Region", "Total Sales"
FROM   sales_model
WHERE  "Year" = 2025
ORDER  BY "Total Sales" DESC
LIMIT  100

The semantic layer takes care of every join, every aggregate, every row- level rule. BI tools see exactly the columns they're allowed to combine.