Multi-Fact: Sales & Returns¶
This example demonstrates the Composite Fact Layer (CFL) planner with a multi-fact scenario. When a query combines measures from different fact tables, OrionBelt uses a CTE with UNION ALL to stitch the fact sets together (with NULL padding), then aggregates in the outer query.
Scenario¶
An e-commerce company tracks store sales and store returns as separate fact tables, both sharing conformed dimensions: Customers and Date Dim.
graph LR
SS["<b>Store Sales</b><br/><i>fact</i><br/>Sale ID<br/>Customer ID<br/>Date Key<br/>Amount"]
SR["<b>Store Returns</b><br/><i>fact</i><br/>Return ID<br/>Customer ID<br/>Date Key<br/>Refund Amt"]
C["<b>Customers</b><br/>Customer ID<br/>Name<br/>Country"]
D["<b>Date Dim</b><br/>Date Key<br/>Full Date<br/>Year<br/>Quarter"]
SS -->|many-to-one| C
SS -->|many-to-one| D
SR -->|many-to-one| C
SR -->|many-to-one| D
Both fact tables join to the same dimension tables — these are conformed dimensions.
The Model¶
# yaml-language-server: $schema=schema/obml-schema.json
version: 1.0
dataObjects:
Customers:
code: CUSTOMERS
database: WAREHOUSE
schema: PUBLIC
columns:
Customer ID:
code: CUSTOMER_ID
abstractType: string
Customer Name:
code: NAME
abstractType: string
Country:
code: COUNTRY
abstractType: string
Date Dim:
code: DATE_DIM
database: WAREHOUSE
schema: PUBLIC
columns:
Date Key:
code: DATE_KEY
abstractType: int
Full Date:
code: FULL_DATE
abstractType: date
Year:
code: YEAR_NUM
abstractType: int
Quarter:
code: QUARTER_NUM
abstractType: int
Store Sales:
code: STORE_SALES
database: WAREHOUSE
schema: PUBLIC
columns:
Sale ID:
code: SALE_ID
abstractType: string
Customer ID:
code: CUSTOMER_ID
abstractType: string
Date Key:
code: DATE_KEY
abstractType: int
Amount:
code: AMOUNT
abstractType: float
joins:
- joinType: many-to-one
joinTo: Customers
columnsFrom:
- Customer ID
columnsTo:
- Customer ID
- joinType: many-to-one
joinTo: Date Dim
columnsFrom:
- Date Key
columnsTo:
- Date Key
Store Returns:
code: STORE_RETURNS
database: WAREHOUSE
schema: PUBLIC
columns:
Return ID:
code: RETURN_ID
abstractType: string
Customer ID:
code: CUSTOMER_ID
abstractType: string
Date Key:
code: DATE_KEY
abstractType: int
Refund Amount:
code: REFUND_AMT
abstractType: float
joins:
- joinType: many-to-one
joinTo: Customers
columnsFrom:
- Customer ID
columnsTo:
- Customer ID
- joinType: many-to-one
joinTo: Date Dim
columnsFrom:
- Date Key
columnsTo:
- Date Key
dimensions:
Customer Country:
dataObject: Customers
column: Country
resultType: string
Year:
dataObject: Date Dim
column: Year
resultType: int
measures:
Total Revenue:
columns:
- dataObject: Store Sales
column: Amount
resultType: float
aggregation: sum
Total Returns:
columns:
- dataObject: Store Returns
column: Refund Amount
resultType: float
aggregation: sum
metrics:
Net Revenue:
expression: '{[Total Revenue]} - {[Total Returns]}'
Query 1: Single-Fact (Star Schema)¶
When a query only uses measures from one fact table, OrionBelt uses the standard star schema planner — no CTEs needed.
query = QueryObject(
select=QuerySelect(
dimensions=["Customer Country"],
measures=["Total Revenue"],
),
order_by=[QueryOrderBy(field="Total Revenue", direction=SortDirection.DESC)],
limit=100,
)
Generated SQL (Postgres):
SELECT
"Customers"."COUNTRY" AS "Customer Country",
SUM("Store Sales"."AMOUNT") AS "Total Revenue"
FROM WAREHOUSE.PUBLIC.STORE_SALES AS "Store Sales"
LEFT JOIN WAREHOUSE.PUBLIC.CUSTOMERS AS "Customers"
ON "Store Sales"."CUSTOMER_ID" = "Customers"."CUSTOMER_ID"
GROUP BY "Customers"."COUNTRY"
ORDER BY "Total Revenue" DESC
LIMIT 100
This is a straightforward star schema query: one fact table (Store Sales) joined to one dimension table (Customers).
Query 2: Multi-Fact (CFL with UNION ALL)¶
When measures come from different fact tables, OrionBelt detects this and switches to the CFL planner. It stitches the facts together with UNION ALL inside a CTE, then aggregates in the outer query.
query = QueryObject(
select=QuerySelect(
dimensions=["Customer Country"],
measures=["Total Revenue", "Total Returns"],
),
order_by=[QueryOrderBy(field="Total Revenue", direction=SortDirection.DESC)],
limit=100,
)
Generated SQL (Postgres):
WITH composite_01 AS (
SELECT
"Customers"."COUNTRY" AS "Customer Country",
"Store Sales"."AMOUNT" AS "Total Revenue",
NULL AS "Total Returns"
FROM WAREHOUSE.PUBLIC.STORE_SALES AS "Store Sales"
LEFT JOIN WAREHOUSE.PUBLIC.CUSTOMERS AS "Customers"
ON "Store Sales"."CUSTOMER_ID" = "Customers"."CUSTOMER_ID"
UNION ALL
SELECT
"Customers"."COUNTRY" AS "Customer Country",
NULL AS "Total Revenue",
"Store Returns"."REFUND_AMT" AS "Total Returns"
FROM WAREHOUSE.PUBLIC.STORE_RETURNS AS "Store Returns"
LEFT JOIN WAREHOUSE.PUBLIC.CUSTOMERS AS "Customers"
ON "Store Returns"."CUSTOMER_ID" = "Customers"."CUSTOMER_ID"
)
SELECT
"Customer Country",
SUM("Total Revenue") AS "Total Revenue",
SUM("Total Returns") AS "Total Returns"
FROM composite_01
GROUP BY "Customer Country"
ORDER BY "Total Revenue" DESC
LIMIT 100
What Happened¶
- Measure grouping — The resolver detected that
Total Revenuecomes fromStore SalesandTotal Returnscomes fromStore Returns— two different fact tables - UNION ALL with NULL padding — Each fact leg selects the conformed dimensions plus its own measure columns, filling the other fact's measures with
NULL. This produces a single row set at the raw grain - Outer aggregation — The outer query groups by the conformed dimensions and applies
SUMover the unioned rows. Because each row only has a value for one fact's measures (NULLs are ignored bySUM), the results are correct - No fanout — The facts are never joined directly to each other, so no many-to-many explosion occurs
Snowflake: UNION BY NAME
On Snowflake, OrionBelt uses UNION BY NAME instead of UNION ALL. This matches columns by name rather than position, so the NULL padding columns can be omitted — each fact leg only selects its own measures.
Why CFL is Necessary¶
Without the Composite Fact Layer, combining measures from different fact tables in a single query would cause fanout — a many-to-many join that inflates aggregation results:
graph LR
SS["Store Sales<br/>10 rows"] --- C["Customers"]
SR["Store Returns<br/>5 rows"] --- C
style C fill:#f96,stroke:#333
Direct join: 10 x 5 = 50 rows — wrong SUM results!
By stacking the facts with UNION ALL and aggregating outside, CFL eliminates fanout entirely. Each row in the union contributes to exactly one fact's measures.
When CFL Activates
The compiler automatically selects CFL when requires_cfl = True in the resolved query. This happens when the query's measures reference fields from multiple distinct fact tables. You do not need to opt in — OrionBelt detects it for you.
Query 3: Net Revenue Metric¶
Metrics that combine measures from different fact tables also use CFL:
query = QueryObject(
select=QuerySelect(
dimensions=["Customer Country", "Year"],
measures=["Net Revenue"],
),
)
Generated SQL (Postgres):
WITH composite_01 AS (
SELECT
"Customers"."COUNTRY" AS "Customer Country",
"Date Dim"."YEAR_NUM" AS "Year",
"Store Sales"."AMOUNT" AS "Total Revenue",
NULL AS "Total Returns"
FROM WAREHOUSE.PUBLIC.STORE_SALES AS "Store Sales"
LEFT JOIN WAREHOUSE.PUBLIC.CUSTOMERS AS "Customers"
ON "Store Sales"."CUSTOMER_ID" = "Customers"."CUSTOMER_ID"
LEFT JOIN WAREHOUSE.PUBLIC.DATE_DIM AS "Date Dim"
ON "Store Sales"."DATE_KEY" = "Date Dim"."DATE_KEY"
UNION ALL
SELECT
"Customers"."COUNTRY" AS "Customer Country",
"Date Dim"."YEAR_NUM" AS "Year",
NULL AS "Total Revenue",
"Store Returns"."REFUND_AMT" AS "Total Returns"
FROM WAREHOUSE.PUBLIC.STORE_RETURNS AS "Store Returns"
LEFT JOIN WAREHOUSE.PUBLIC.CUSTOMERS AS "Customers"
ON "Store Returns"."CUSTOMER_ID" = "Customers"."CUSTOMER_ID"
LEFT JOIN WAREHOUSE.PUBLIC.DATE_DIM AS "Date Dim"
ON "Store Returns"."DATE_KEY" = "Date Dim"."DATE_KEY"
)
SELECT
"Customer Country",
"Year",
(SUM("Total Revenue") - SUM("Total Returns")) AS "Net Revenue"
FROM composite_01
GROUP BY "Customer Country", "Year"
The metric expression {[Total Revenue]} - {[Total Returns]} is applied to the aggregated sums in the outer SELECT. Both measures are aggregated from the same UNION ALL result set, with NULLs naturally ignored by SUM.
Multiple Conformed Dimensions
When there are multiple conformed dimensions (here: Customer Country and Year), both appear in the GROUP BY of the outer query. Each UNION leg joins its own dimension tables independently, so the grain is preserved correctly.