OrionBelt Semantic QL (OBSQL)¶
Semantic QL is OrionBelt's natural SQL surface — BI-style SQL written
against a per-model virtual table and translated to a QueryObject that
flows through the standard compilation pipeline. The short brand form is
OBSQL; both terms refer to the same language. It's the third member of
the OrionBelt trio:
| Short form | Full name | What it is |
|---|---|---|
| OBSL | OrionBelt Semantic Layer | The system |
| OBML | OrionBelt Modeling Language | YAML format for defining models |
| OBSQL | OrionBelt Semantic QL | SQL surface for querying them |
Semantic QL is the same SQL dialect on two transports:
- REST:
POST /v1/sessions/{id}/query/semantic-ql(and the top-level shortcutPOST /v1/query/semantic-qlin single-model mode) - Arrow Flight SQL: any plain SQL the Flight server receives is classified by FROM target. Targets that match the model's virtual table are translated through the same path. See the Flight server section below.
Why a virtual table?¶
BI tools (Tableau, Power BI, Metabase, Superset, DBeaver) compose SQL from a catalog tree and a column picker. They cannot author OBML YAML. Exposing the semantic model as one virtual table per model, with columns = dimensions + measures + metrics, gives those tools exactly the SQL surface they expect — without leaking warehouse joins to the user.
Semantic QL is OBSL's take on a well-trodden pattern — Cube SQL API, dbt
Semantic Layer JDBC, AtScale, and Snowflake SEMANTIC_VIEW(...) all
expose semantic models as virtual SQL surfaces. The differentiator:
Semantic QL ships with explicit measure markers (MEASURE("X")),
aggregate-wrap matching (SUM(sum_measure) validated against the
declared aggregation), and first-class hierarchical subtotals
(WITH ROLLUP / WITH CUBE) — see the corresponding sections below.
Accepted shape¶
SELECT <dimension or measure labels>
[FROM <model_name>]
[WHERE <predicates>]
[HAVING <predicates>]
[WITH ROLLUP | WITH CUBE]
[ORDER BY <label or position> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, ...]]
[LIMIT <n>]
[OFFSET <n>]
FROM is optional on a connection where the model is implicit (single-model
mode, or multi-model with the database selector set). SELECT "dim", "measure"
means the same as SELECT "dim", "measure" FROM <model> — see
No-FROM mode below.
WITH ROLLUP / WITH CUBE is accepted either trailing or in its
spec-correct slot before ORDER BY / LIMIT.
| Clause | Rules |
|---|---|
SELECT |
Bare identifiers, MEASURE("<label>"), or an aggregate wrapper that matches the measure's declared aggregation (SUM("X") on a SUM-measure, COUNT(DISTINCT "X") on a count_distinct-measure, etc.). Mismatched wraps (MIN on a SUM-measure) and any wrap on a metric reject with a message naming the declared aggregation. SELECT * and free-form expressions are rejected. |
FROM |
The model's virtual table, or omitted entirely. Any other target is rejected with RAW_SQL_REJECTED — OBSL is closed by design, there is no env flag to bypass. |
WHERE |
column op literal atoms joined by AND. Measure / metric references are auto-routed to HAVING. Top-level OR is rejected. |
HAVING |
Same shape as WHERE; passes through unchanged. |
GROUP BY |
Silently ignored — implicit from the dimensions in SELECT. BI tools auto-emit it; we tolerate it for compatibility. |
ORDER BY |
Identifier (must be a SELECT alias) or 1-based position. Optional ASC/DESC and NULLS FIRST/NULLS LAST. Without the null position, dialect defaults apply (Postgres / Snowflake / DuckDB / Dremio put NULLs last on ASC; MySQL / ClickHouse / BigQuery / Databricks the opposite — set it explicitly for portable behavior). |
LIMIT |
Integer literal. |
OFFSET |
Integer literal. Useful for keyset / page pagination after ORDER BY. |
WITH ROLLUP / WITH CUBE |
Trailing modifier (or spec-position before ORDER BY) — see below. |
Selecting measures: three accepted forms¶
The translator accepts three syntactically distinct ways to reference a
measure or metric in SELECT:
-- 1. Bare label — terse, recommended for hand-written SQL
SELECT "Region", "Total Sales" FROM sales_model
-- 2. MEASURE() marker — Snowflake SEMANTIC_VIEW / Databricks metric-view syntax
SELECT "Region", MEASURE("Total Sales") FROM sales_model
-- 3. Aggregate wrapper that matches the measure's declared aggregation
SELECT "Region", SUM("Total Sales") FROM sales_model
All three compile to identical vendor SQL — the wrapper is stripped at translation time.
Rule for aggregate wrappers¶
For measures, the wrapping aggregate must equal the measure's
declared aggregation. Mismatches reject with a clear error:
| Measure declares | Accepted wrap | Rejected example |
|---|---|---|
sum |
SUM("X") |
MIN("X"), AVG("X") |
count |
COUNT("X") |
SUM("X") |
count_distinct |
COUNT(DISTINCT "X") |
COUNT("X") |
avg |
AVG("X") |
other |
min / max |
matching MIN("X") / MAX("X") |
crossing them |
Error message names the declared aggregation so the caller can fix it:
[UNSUPPORTED_SQL_FEATURE]MeasureOrder Countis declared asCOUNT— applyingSUMwould change its math. UseCOUNT("Order Count"), bare"Order Count", orMEASURE("Order Count").
Wrappers on metrics¶
Metrics reject every aggregate wrapper — a derived expression like
Revenue per Order = SUM(amount) / COUNT(order_id) is already evaluated
at the query's grain, and no outer aggregate is mathematically correct.
Use bare "Revenue per Order" or MEASURE("Revenue per Order").
[UNSUPPORTED_SQL_FEATURE]MetricRevenue per Orderis a derived expression already evaluated at the query's grain — applyingSUM(...)would change its math. Use bare"Revenue per Order"orMEASURE("Revenue per Order").
Why this matters¶
BI tools (Tableau, Power BI, Metabase, Superset) emit SUM(measure_col)
or COUNT(measure_col) reflexively when you drop a measure on a viz.
For SUM-typed measures this now works seamlessly. For metrics, the user
sees a concrete error pointing at the right syntax instead of silently
getting wrong numbers (e.g., the classic "sum of per-region ratios"
trap). Honesty over convenience — the semantic layer exists to make the
math right.
Multi-model addressing (v2.4.0+)¶
Start the server with multiple OBML files pre-loaded:
export MODEL_FILES=examples/sales.yaml,examples/returns.yaml,examples/finance.yaml
uv run orionbelt-api
Each model becomes its own Flight SQL catalog. BI tools pick one via the Database field in their connection dialog:
DBeaver:
Connection → Database: sales
Tableau:
Same field on the Arrow Flight JDBC connector
psql (pgwire, v2.5.0+):
postgresql://obsl:KEY@host:5432/sales
pyarrow programmatic:
import pyarrow.flight as flight
options = flight.FlightCallOptions(headers=[(b"database", b"sales")])
info = client.get_flight_info(
flight.FlightDescriptor.for_command(b'SELECT "Region", "Total Sales" FROM sales'),
options=options,
)
Model naming rules¶
Each OBML defines its addressing key via a top-level name: field; if
unset, the filename stem is used. Both go through identical
normalization:
1. lowercase
2. replace runs of [space, dot, dash] with underscore
3. collapse underscore runs
4. strip leading/trailing underscores
5. strip a trailing `_obml` suffix
6. validate against ^[a-z][a-z0-9_]{0,62}$
Examples: My Sales Model → my_sales_model, commerce.v2 →
commerce_v2, sales.obml.yaml (filename) → sales.
Reserved names (rejected at startup): obsl, obml, obsql, model,
default, public, information_schema, pg_catalog,
sqlite_master, mysql, sys, admin, root.
Name collisions across MODEL_FILES entries error at startup with a
clear message — no silent shadowing.
Discovery¶
GET /v1/models lists every loaded model:
curl http://localhost:8000/v1/models
{
"models": [
{"name": "sales", "description": "...", "dimensions": 12, "measures": 8, ...},
{"name": "returns", "description": "...", "dimensions": 9, "measures": 5, ...}
],
"count": 2
}
Auto-resolve¶
If exactly one model is loaded (single-model mode or MODEL_FILES
with one entry), no selector is needed — every connection targets it
implicitly.
What happens without a selector when ambiguous¶
[NO_MODEL_SELECTED] Multiple models are loaded and no selector was sent
on this connection. Pick one by setting the connection's `database`
field (or `x-obsl-model` header). Available models: returns, sales.
DBeaver: Connection → Database field = <name>
Tableau: Same field on the Arrow Flight JDBC connector
pyarrow: options = flight.FlightCallOptions(
headers=[(b'database', b'<name>')])
REST: Use /v1/sessions/<name>/query/semantic-ql
Discover available models via GET /v1/models.
The error itself is the documentation — no need to remember anything.
No-FROM mode — implicit model¶
On a connection with a resolved model (single-model, or multi-model with
the database selector set), FROM is optional. These two queries are
equivalent:
SELECT "Customer Country", "Total Revenue"
SELECT "Customer Country", "Total Revenue" FROM sales_model
Classification routes a no-FROM SELECT to semantic mode when every
column identifier matches a dim / measure / metric on the resolved model.
Unknown identifiers reject with RAW_SQL_REJECTED — not
UNKNOWN_SELECT_ITEM — so users get a clear "this isn't a query against
the model" signal rather than a translator error.
SELECT 1, SELECT version(), and other canned scalar probes keep their
catalog routing (BI tools rely on these for connectivity checks).
Smoke-test from the terminal¶
The repo ships examples/obsql.py — a ~150-line pyarrow.flight CLI for
running OBSQL without a BI tool. Naming follows the psql / snowsql
convention (the CLI shares its name with the language it runs).
# Single-model or auto-resolve
uv run python examples/obsql.py 'SELECT version()'
uv run python examples/obsql.py 'SHOW TABLES'
uv run python examples/obsql.py 'SELECT "Region", "Total Sales" LIMIT 5'
# Multi-model — pick with -m / --model
uv run python examples/obsql.py -m sales 'SHOW TABLES'
# Discover loaded models via REST /v1/models
uv run python examples/obsql.py --list --rest-port 8000
# Verify governance
uv run python examples/obsql.py 'DROP TABLE foo' # → WRITE_OPERATION_REJECTED
uv run python examples/obsql.py 'SELECT * FROM customers' # → RAW_SQL_REJECTED
The CLI sets the database gRPC metadata header from -m (matching what
the Arrow Flight SQL JDBC driver sends when DBeaver's Database field is
set), so it exercises the exact same routing path BI tools take.
Raw mode — detail rows via qualified columns¶
Sometimes you need un-aggregated rows from a data object — the OBML
"raw mode" shape. Trigger it by writing qualified "DataObject"."column"
references in SELECT:
SELECT "Customers"."Customer Name", "Customers"."Country"
FROM sales_model
WHERE "Customers"."Country" = 'US'
ORDER BY "Customers"."Country"
LIMIT 100
Compiles to a QueryObject with select.fields=[...] (no aggregation,
no joins). Detection rule:
- Every SELECT item must be
"<DataObject>"."<column>" - WHERE predicates target qualified columns the same way
DISTINCTis honoured (SELECT DISTINCT "Customers"."Country" FROM ...)HAVING,GROUP BY, andWITH ROLLUPare rejected in raw mode- Mixing a qualified raw column with a bare dim/measure →
MIXED_RAW_AND_AGGREGATE_MODE
Raw mode is the OBSQL equivalent of REST /query/execute with
select.fields=[...]. It bypasses the dim/measure abstraction but stays
inside the semantic layer — model-defined row-level filters still
apply, no joins, no warehouse-side ad-hoc SQL.
Rejected SQL¶
The translator rejects shapes that don't fit the semantic model with stable error codes:
| Code | Triggered by |
|---|---|
UNKNOWN_SELECT_ITEM |
SELECT item that's not a known dim / measure / metric. |
UNKNOWN_FILTER_FIELD |
WHERE / HAVING field that's not a known dim / measure / metric. |
UNKNOWN_ORDER_BY_FIELD |
ORDER BY identifier missing from SELECT. |
INVALID_ORDER_BY_POSITION |
Numeric position outside [1, n]. |
UNSUPPORTED_SQL_FEATURE |
JOIN, CTE, subquery, UNION, window function, SELECT *, aggregate call wrapped around a measure, top-level OR. |
RAW_SQL_REJECTED |
Flight: FROM target is not the virtual table and not a catalog source. Raw warehouse SQL is never accepted (no flag to bypass). |
WRITE_OPERATION_REJECTED |
Flight or REST: INSERT / UPDATE / DELETE / DROP / CREATE / ALTER / TRUNCATE / MERGE / GRANT / REVOKE / COMMIT / ROLLBACK. OBSL is read-only. |
MIXED_RAW_AND_AGGREGATE_MODE |
SELECT mixes qualified raw-mode columns ("DataObject"."column") with bare dim/measure labels. Use one form consistently. |
Hierarchical subtotals: WITH ROLLUP / WITH CUBE¶
Add a trailing modifier to compute hierarchical subtotals (ROLLUP) or
the full cross-tab (CUBE):
The compiler emits the dialect-appropriate form:
| Dialect | Emitted SQL |
|---|---|
| Postgres, Snowflake, Databricks, DuckDB, Dremio, BigQuery, MySQL | GROUP BY ROLLUP(a, b) / GROUP BY CUBE(a, b) |
| ClickHouse | GROUP BY a, b WITH ROLLUP / GROUP BY a, b WITH CUBE |
For every selected dimension, a GROUPING(dim) AS _g_<dim> column is
appended to the result schema. 0 means the row carries a real value for
that dimension; 1 means it was rolled up (NULL value in the dim column).
The flag columns are the only reliable way to tell a subtotal row from a
detail row whose dim is legitimately NULL.
Measure additivity under rollup¶
OBSL doesn't classify measures by additivity or rewrite their SQL. The database recomputes each grouping level from base rows:
- Additive measures (
SUM,COUNT): subtotals sum to higher levels and to the grand total. - Non-additive measures (
COUNT(DISTINCT),AVG, percentiles, ratios defined asAVG(x/y)): each grouping level is individually correct, but subtotals do not sum to the grand total. Mathematically expected; not a bug. - Weighted ratios (
SUM(x) / SUM(y)): roll up correctly at every level.
Restrictions¶
WITH ROLLUP/WITH CUBErequire at least one dimension inSELECT.- The two are mutually exclusive.
- Combining rollup/cube with
total: truemeasures, period-over-period metrics, or cumulative metrics emits anINCOMPATIBLE_COMBINATIONwarning. The query still runs but the_g_*flag columns may not appear in the final projection.
REST examples¶
Execute¶
curl -X POST http://localhost:8000/v1/query/semantic-ql \
-H 'content-type: application/json' \
-d '{
"sql": "SELECT \"Region\", \"Total Sales\" FROM sales_model WITH ROLLUP",
"dialect": "duckdb"
}'
The response shape matches /v1/query/execute: sql, dialect,
columns, rows, explain, plus the freshness-cache metadata.
Compile-only (debugging)¶
curl -X POST http://localhost:8000/v1/query/semantic-ql/compile \
-H 'content-type: application/json' \
-d '{ "sql": "SELECT \"Region\", \"Total Sales\" FROM sales_model" }'
The compile response also includes the translated QueryObject JSON
under query, so you can see exactly what your SQL became.
Arrow Flight SQL¶
The Flight server classifies every incoming SQL by its first FROM
target:
| FROM target | Mode | Behavior |
|---|---|---|
<model_name> (virtual table) |
semantic |
Translated → compiled → executed |
SHOW TABLES, DESCRIBE, information_schema.*, pg_catalog.*, SELECT version() / current_schema() / SELECT 1 |
catalog |
Answered from the model; never touches the warehouse |
| anything else | rejected |
RAW_SQL_REJECTED — no flag to bypass |
DDL/DML (INSERT, UPDATE, DELETE, DROP, CREATE, ALTER, TRUNCATE, …) |
rejected |
WRITE_OPERATION_REJECTED — OBSL is read-only by design |
Governance — closed by design¶
There are no env flags to enable raw SQL or write operations. OBSL is a semantic layer, not a JDBC proxy. The only thing the warehouse ever receives is SQL produced by the OBSL compiler:
| Source | What reaches the warehouse |
|---|---|
| Semantic QL via Flight | Compiled SQL from CompilationPipeline |
QueryObject via REST /query/execute |
Compiled SQL from CompilationPipeline |
| OBML YAML via Flight | Compiled SQL from CompilationPipeline |
| Catalog discovery (SHOW / DESCRIBE / information_schema / pg_catalog) | Nothing — answered from the model in-process |
| Anything else | Nothing — rejected at the door |
Operators cannot accidentally open a hole. There is no "raw SQL" mode, no admin override, no escape hatch. If you need direct warehouse access, use the warehouse's own clients — not OBSL.
The Flight catalog (CommandGetTables, CommandGetColumns,
ListFlights) lists the semantic virtual table first with its
dimension / measure / metric columns, plus the _dimensions,
_measures, and _metrics metadata views. Data-object physical
columns are not exposed.
Schema probe shortcut¶
When a Flight client calls GetFlightInfo for a semantic-mode query, the
result schema is built directly from the model — no warehouse round-trip
is needed to learn the column types. Faster catalog navigation; no
spurious EXPLAIN-shaped queries hit the database.
BI tool setup¶
The general recipe across BI tools that speak Flight SQL:
- Install the Apache Arrow Flight SQL JDBC
.jar. - Connect to
jdbc:arrow-flight-sql://<host>:8815?useEncryption=false. - Browse the schema — you'll see one table per loaded model, with columns labelled by dimension / measure / metric.
- In the SQL editor, write Semantic QL against that virtual table:
SELECT "Region", "Total Sales"
FROM sales_model
WHERE "Year" = 2025
ORDER BY "Total Sales" DESC
LIMIT 100
The semantic layer takes care of every join, every aggregate, every row- level rule. BI tools see exactly the columns they're allowed to combine.