OrionBelt ML (OBML) Model Format¶
OrionBelt ML (OBML) is the YAML-based format for defining semantic models in OrionBelt. A model describes your data warehouse tables (data objects), business dimensions, aggregate measures, and composite metrics.
Top-Level Structure¶
# yaml-language-server: $schema=schema/obml-schema.json
version: 1.0
owner: team-data # Optional: model-level owner
dataObjects: # Database tables/views with columns and joins
...
dimensions: # Named dimensions referencing data object columns
...
measures: # Aggregations with expressions
...
metrics: # Composite metrics combining measures
...
All four sections are dictionaries keyed by name.
Owner Field¶
Every level of the model supports an optional owner field — a free-text string identifying the responsible team or person. The owner is returned in model discovery API responses.
version: 1.0
owner: team-data
dataObjects:
Orders:
owner: team-sales
columns:
Price:
owner: team-finance
dimensions:
Country:
owner: team-analytics
measures:
Revenue:
owner: team-analytics
metrics:
Revenue per Order:
owner: team-analytics
Data Objects¶
A data object maps to a database table or custom SQL statement. Each data object declares its columns and optional join relationships.
dataObjects:
Orders:
code: ORDERS # Table name or custom SQL
database: WAREHOUSE # Database/catalog
schema: PUBLIC # Schema
columns:
Order ID:
code: ORDER_ID # Physical column name
abstractType: string
Order Date:
code: ORDER_DATE
abstractType: date
Customer ID:
code: CUSTOMER_ID
abstractType: string
Price:
code: PRICE
abstractType: float
numClass: non-additive
joins:
- joinType: many-to-one
joinTo: Customers
columnsFrom:
- Customer ID
columnsTo:
- Customer ID
Data Object Properties¶
| Property | Type | Required | Description |
|---|---|---|---|
code |
string | Yes | Table name or SQL statement |
database |
string | Yes | Database/catalog name |
schema |
string | Yes | Schema name |
columns |
map | Yes | Dictionary of column definitions |
joins |
list | No | Join relationships to other data objects |
comment |
string | No | Documentation |
synonyms |
list | No | Alternative names or terms (LLM hints) |
owner |
string | No | Responsible team or person |
Columns¶
| Property | Type | Required | Description |
|---|---|---|---|
code |
string | Yes | Physical column name in the database |
abstractType |
enum | Yes | string, int, float, date, time, time_tz, timestamp, timestamp_tz, boolean, json |
sqlType |
string | No | Informational: SQL data type (e.g. VARCHAR, INTEGER, NUMERIC(10,2)) |
sqlPrecision |
int | No | Informational: numeric precision |
sqlScale |
int | No | Informational: numeric scale |
numClass |
enum | No | Classification of numeric columns to control aggregation behavior. categorical (IDs/codes), additive (sum-safe), non-additive (rates/ratios) |
comment |
string | No | Documentation |
synonyms |
list | No | Alternative names or terms (LLM hints) |
owner |
string | No | Responsible team or person |
Joins¶
Joins define relationships between data objects. The data object that declares the join is the "from" side.
| Property | Type | Required | Description |
|---|---|---|---|
joinType |
enum | Yes | many-to-one, one-to-one, many-to-many |
joinTo |
string | Yes | Target data object name |
columnsFrom |
list | Yes | Column names in this data object (join keys) |
columnsTo |
list | Yes | Column names in the target data object (join keys) |
secondary |
bool | No | Mark as a secondary (alternative) join path (default: false) |
pathName |
string | No | Unique name for this join path (required when secondary: true) |
Fact tables declare joins
By convention, fact tables (e.g., Orders) declare joins to dimension tables (e.g., Customers, Products). The compiler uses this to identify fact tables — data objects with joins are preferred as base objects during query resolution.
Secondary Joins¶
When a data object has multiple relationships to the same target (e.g., a Flights table joining to Airports via both departure and arrival), mark the additional joins as secondary with a unique pathName:
dataObjects:
Flights:
code: FLIGHTS
database: WAREHOUSE
schema: PUBLIC
columns:
Departure Airport:
code: DEP_AIRPORT
abstractType: string
Arrival Airport:
code: ARR_AIRPORT
abstractType: string
joins:
- joinType: many-to-one
joinTo: Airports
columnsFrom:
- Departure Airport
columnsTo:
- Airport ID
- joinType: many-to-one
joinTo: Airports
secondary: true
pathName: arrival
columnsFrom:
- Arrival Airport
columnsTo:
- Airport ID
Rules:
- Every secondary join must have a
pathName pathNamemust be unique per(source, target)pair (not globally)- Secondary joins are excluded from cycle detection and multipath validation
- Queries use
usePathNamesto select a secondary join instead of the default primary — see Query Language
Column References¶
Columns are referenced using the dataObject + column pair throughout the model:
Column names must be unique within each data object. Dimensions, measures, and metrics must have unique names across the whole model.
Dimensions¶
A dimension defines a business attribute used for grouping (GROUP BY) in queries.
dimensions:
Customer Country:
dataObject: Customers
column: Country
resultType: string
Order Date:
dataObject: Orders
column: Order Date
resultType: date
timeGrain: month
Dimension Properties¶
| Property | Type | Required | Description |
|---|---|---|---|
dataObject |
string | Yes | Source data object name |
column |
string | Yes | Column name in the data object |
resultType |
enum | Yes | Data type of the result (informative only, not used for SQL generation) |
label |
string | No | Display label |
timeGrain |
enum | No | Time grain: year, quarter, month, week, day, hour, minute, second |
format |
string | No | Display format |
synonyms |
list | No | Alternative names or terms (LLM hints) |
owner |
string | No | Responsible team or person |
Time Dimensions¶
Set timeGrain to apply time grain truncation:
This generates date_trunc('month', col) in Postgres/Snowflake or toStartOfMonth(col) in ClickHouse.
You can also apply time grain at query time using the "dimension:grain" syntax — see Query Language.
Measures¶
A measure defines an aggregate computation over data object columns.
Simple Measure (single column)¶
measures:
Order Count:
columns:
- dataObject: Orders
column: Order ID
resultType: int
aggregation: count
Expression Measure (computed from columns)¶
Reference columns directly in the expression using {[DataObject].[Column]}:
measures:
Revenue:
resultType: float
aggregation: sum
expression: '{[Orders].[Price]} * {[Orders].[Quantity]}'
measures:
Profit:
resultType: float
aggregation: sum
expression: '{[Sales].[Salesamount]} - {[Sales].[Salescosts]}'
total: true
Measure Properties¶
| Property | Type | Required | Description |
|---|---|---|---|
columns |
list | No | List of column references (dataObject+column) for simple single-column measures |
resultType |
enum | Yes | Data type of the result (informative only, not used for SQL generation) |
aggregation |
enum | Yes | sum, count, count_distinct, avg, min, max, listagg |
expression |
string | No | Expression with {[DataObject].[Column]} placeholders |
distinct |
bool | No | Apply DISTINCT to aggregation |
total |
bool | No | Use the total (unfiltered) value when referenced in a metric |
delimiter |
string | No | Separator for listagg aggregation (default: ",") |
withinGroup |
object | No | Ordering clause for listagg — specifies column and order (ASC/DESC) |
filters |
list | No | Filters applied to this measure (supports AND/OR/NOT groups) |
allowFanOut |
bool | No | Allow fan-out joins (default: false) |
synonyms |
list | No | Alternative names or terms (LLM hints) |
owner |
string | No | Responsible team or person |
Aggregation Types¶
| Type | SQL | Example |
|---|---|---|
sum |
SUM(expr) |
Total revenue |
count |
COUNT(expr) |
Number of orders |
count_distinct |
COUNT(DISTINCT expr) |
Unique customers |
avg |
AVG(expr) |
Average price |
min |
MIN(expr) |
Earliest date |
max |
MAX(expr) |
Latest date |
any_value |
ANY_VALUE(expr) |
Any single value from the group (any() in ClickHouse) |
median |
MEDIAN(expr) |
Median value (PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ...) in Postgres) |
mode |
MODE(expr) |
Most frequent value (MODE() WITHIN GROUP (ORDER BY ...) in Postgres, topK(1)(col)[1] in ClickHouse; not supported in Dremio) |
listagg |
LISTAGG(expr, sep) |
Concatenated values (dialect-specific: STRING_AGG in Postgres, ARRAY_JOIN(COLLECT_LIST(...)) in Databricks, arrayStringConcat(groupArray(...)) in ClickHouse) |
Expression Placeholders¶
| Placeholder | Resolves to |
|---|---|
{[DataObject].[Column]} |
Column reference by data object and column name |
Measure Filters¶
Apply filters to a measure so it only aggregates matching rows. The filters property accepts a list of leaf filters and filter groups.
Single filter¶
measures:
Sales Profit Ratio:
resultType: float
aggregation: sum
expression: '({[Sales].[Salesamount]} / {[Sales].[Salescosts]}) * 100'
filters:
- column:
dataObject: Sales
column: Salescosts
operator: gt
values:
- dataType: float
valueFloat: 100.00
Multiple filters with AND/OR logic¶
Use filter groups for boolean combinations:
measures:
Domestic Revenue:
columns:
- dataObject: Line Items
column: Extended Price
resultType: float
aggregation: sum
filters:
- logic: or
filters:
- column:
dataObject: Nations
column: Name
operator: equals
values:
- dataType: string
valueString: UNITED STATES
- column:
dataObject: Nations
column: Name
operator: equals
values:
- dataType: string
valueString: CANADA
Filter Group Properties¶
| Property | Type | Default | Description |
|---|---|---|---|
logic |
enum | and |
and or or — how to combine child filters |
filters |
list | — | Child filters (leaf filters or nested filter groups) |
negated |
bool | false |
Wrap the combined expression with NOT |
Multiple top-level filters are combined with AND. Filter groups and leaf filters can be mixed freely and nested recursively.
How Measure Filters Compile¶
Filtered measures compile to CASE WHEN inside the aggregate function. The implicit ELSE NULL is ignored by all aggregate functions (SUM, COUNT, AVG, etc.):
-- Unfiltered: SUM("extendedprice" * (1 - "discount"))
-- Filtered: SUM(CASE WHEN "returnflag" = 'R'
-- THEN "extendedprice" * (1 - "discount") END)
This works with all planners (star, CFL, cumulative, period-over-period) and all 8 dialects. Filtered measures can be combined with unfiltered measures in ratio metrics:
LISTAGG Measures¶
Use listagg to concatenate column values into a delimited string. OrionBelt renders the correct SQL for each database dialect automatically.
measures:
Product Names:
columns:
- dataObject: Products
column: Product Name
resultType: string
aggregation: listagg
delimiter: ', '
withinGroup:
column:
dataObject: Products
column: Product Name
order: ASC
The delimiter defaults to "," if omitted. The withinGroup clause is optional and specifies ordering of the concatenated values.
Metrics¶
Metrics come in three types: derived (composite expression), cumulative (window function over a measure), and period-over-period (time comparison).
Derived Metrics¶
A derived metric combines multiple measures into a KPI. The expression references measures by name using {[Measure Name]} template syntax.
metrics:
Revenue per Order:
expression: '{[Revenue]} / {[Order Count]}'
Net Revenue:
expression: '{[Sales Amount]} - {[Return Amount]}'
All artefacts (data objects, dimensions, measures, metrics) have unique names. The {[Name]} placeholders in a metric expression must match existing measure names exactly.
Cumulative Metrics¶
A cumulative metric applies a window function to an existing measure, ordered by a time dimension. Three patterns are supported:
| Pattern | Configuration | SQL Frame |
|---|---|---|
| Running total | (default — no window or grainToDate) |
ROWS UNBOUNDED PRECEDING |
| Rolling window | window: N |
ROWS BETWEEN N-1 PRECEDING AND CURRENT ROW |
| Grain-to-date | grainToDate: month |
PARTITION BY DATE_TRUNC('month', ...) + unbounded |
metrics:
# Running total (unbounded cumulative sum)
Cumulative Revenue:
type: cumulative
measure: Revenue
timeDimension: Order Date
description: Running total of revenue
# Rolling 7-period average
7-Day Rolling Avg Revenue:
type: cumulative
measure: Revenue
timeDimension: Order Date
cumulativeType: avg
window: 7
# Month-to-Date (resets each month)
MTD Revenue:
type: cumulative
measure: Revenue
timeDimension: Order Date
grainToDate: month
# Year-to-Date (resets each year)
YTD Revenue:
type: cumulative
measure: Revenue
timeDimension: Order Date
grainToDate: year
# Rolling peak
30-Day Peak Revenue:
type: cumulative
measure: Revenue
timeDimension: Order Date
cumulativeType: max
window: 30
Time dimension requirement
The timeDimension must be included in the query's selected dimensions. Cumulative metrics without their time dimension in the SELECT will raise a validation error.
Period-over-Period Metrics¶
A period-over-period metric compares a measure against a prior time period. The expression references the base measure, and the periodOverPeriod block configures how to shift time and compute the comparison.
metrics:
Revenue YoY Growth:
type: period_over_period
expression: '{[Revenue]}'
periodOverPeriod:
timeDimension: Order Date
grain: month
offset: -1
offsetGrain: year
comparison: percentChange
Revenue MoM Diff:
type: period_over_period
expression: '{[Revenue]}'
periodOverPeriod:
timeDimension: Order Date
grain: month
offset: -1
offsetGrain: month
comparison: difference
Four comparison modes are available:
| Comparison | Formula | Use case |
|---|---|---|
percentChange |
current / NULLIF(prev, 0) - 1 |
YoY growth rate |
ratio |
current / NULLIF(prev, 0) |
Current-to-previous ratio |
difference |
current - prev |
Absolute change |
previousValue |
prev |
Prior period value alongside current |
Time dimension requirement
The timeDimension must be included in the query's selected dimensions. All PoP metrics in a single query must share the same timeDimension and grain.
For a detailed guide on PoP metrics, including CTE architecture, filter push-down, and dialect-specific SQL examples, see the Period-over-Period Metrics guide.
Metric Properties¶
| Property | Type | Default | Description |
|---|---|---|---|
type |
"derived" | "cumulative" | "period_over_period" |
"derived" |
Metric category |
expression |
string | — | Expression with {[Measure Name]} placeholders (required for derived and period_over_period) |
measure |
string | — | Name of base measure (required for cumulative) |
timeDimension |
string | — | Dimension used for ordering (required for cumulative) |
cumulativeType |
"sum" | "avg" | "min" | "max" | "count" |
"sum" |
Window aggregation function |
window |
integer | — | Rolling window size in periods (mutually exclusive with grainToDate) |
grainToDate |
"year" | "quarter" | "month" | "week" |
— | Reset boundary (mutually exclusive with window) |
periodOverPeriod |
object | — | Period-over-period configuration (required for period_over_period) |
label |
string | — | Display label |
description |
string | — | Business description |
format |
string | — | Display format |
synonyms |
list | — | Alternative names or terms (LLM hints) |
owner |
string | — | Responsible team or person |
Metric Expression Placeholders¶
| Placeholder | Resolves to |
|---|---|
{[Measure Name]} |
Named reference to any defined measure (derived metrics only) |
Synonyms¶
All five element levels (data object, column, dimension, measure, metric) support an optional synonyms list. Synonyms provide alternative names or terms that help LLMs map natural-language questions to the correct model element.
dataObjects:
Customers:
code: CUSTOMERS
database: WAREHOUSE
schema: PUBLIC
synonyms: [client, buyer, purchaser]
columns:
Country:
code: COUNTRY
abstractType: string
synonyms: [nation, region]
dimensions:
Customer Country:
dataObject: Customers
column: Country
synonyms: [client country, buyer country]
measures:
Revenue:
aggregation: sum
expression: '{[Orders].[Amount]}'
synonyms: [sales, income, turnover]
Synonyms are surfaced in the describe_model response (REST API and MCP) so LLMs can match user intent to the correct dimension, measure, or data object even when the user uses different terminology.
Custom Extensions¶
All six levels (model, data object, column, dimension, measure, metric) support an optional customExtensions array for vendor-specific metadata. OrionBelt preserves these during parsing and compilation but does not interpret them.
customExtensions:
- vendor: OSI
data: '{"instructions": "Use for retail analytics", "synonyms": ["sales"]}'
- vendor: GOVERNANCE
data: '{"owner": "data-team", "classification": "internal"}'
Custom Extension Properties¶
| Property | Type | Required | Description |
|---|---|---|---|
vendor |
string | Yes | Vendor or format identifier (e.g. OSI, GOVERNANCE) |
data |
string | Yes | Opaque data payload (typically a JSON string) |
Use cases:
- OSI interoperability: Preserving
ai_context(instructions, synonyms, examples) from OSI models during conversion - Governance tags: Owner, classification, cost center, lineage information
- Vendor-specific metadata: Any key-value data that OrionBelt should pass through without interpretation
Validation Rules¶
OrionBelt validates models against these rules:
- Unique identifiers — Column names unique within each data object; dimension, measure, and metric names unique across the model
- No cyclic joins — Join graph must be acyclic (secondary joins are excluded)
- No multipath joins — No ambiguous diamond patterns (secondary joins are excluded). A canonical join exception applies: when a data object has a direct join to a target AND also an indirect path through intermediaries, the direct join is treated as canonical and no error is raised. Only true diamonds (two indirect paths to the same target) are flagged.
- Secondary join constraints — Every secondary join must have a
pathName;pathNamemust be unique per(source, target)pair - Measures resolve — All column references in measures must point to existing data object columns
- Join targets exist — All
joinTotargets must be defined data objects - References resolve — All dimension references (dataObject/column) must resolve
Validation errors include source positions (line/column) when available.
Full Example¶
See the Sales Model Walkthrough for a complete annotated example.