Pattern

What is a Semantic Sidecar?

A governed semantic layer that runs alongside your existing data platforms instead of inside a BI tool or as a centralized rewrite. One model, many consumers, no architecture change.

The Pattern in One Paragraph

A Semantic Sidecar is a governed semantic layer that sits beside your existing data platforms (databases, lakehouses, BI tools, ML pipelines) and exposes business concepts (dimensions, measures, metrics, business rules) through a unified API. Instead of embedding semantics inside one BI tool or rewriting your stack around a centralized semantic platform, the sidecar pattern lets the same governed model serve AI agents via MCP, analytics workflows via REST, DB-API, or the PostgreSQL Wire Protocol, and reporting via Apache Arrow Flight SQL. The data stays where it is. The semantics live next to it, version-controlled, and addressable by anyone who needs them.

One sentence: the Semantic Sidecar pattern decouples business semantics from any single consumer, so AI agents, analytics, and data systems all query the same governed truth without architectural lock-in.

The Problem It Solves

Most organizations land in one of two failure modes around business semantics:

  1. Embedded-in-BI: the semantic model lives inside Tableau, Power BI, Looker, or another BI tool. Only that tool benefits. Your AI agents, data science notebooks, and ML pipelines all reinvent the same definitions, drift apart, and produce conflicting numbers.
  2. Centralized-rewrite: a "modern semantic layer" platform demands you route all queries through it, often with its own query engine, its own auth model, and its own SaaS billing. The architecture change is huge, and you still end up with vendor lock-in.

The Semantic Sidecar pattern is the third option. It treats semantics as a shared service that any consumer can call, without forcing anyone to change how they store or process data.

Sidecar vs Embedded vs Centralized

Aspect Embedded (in BI) Centralized platform Semantic Sidecar
Architecture changeNoneMajorNone
Consumer scopeOne BI toolAnything routed through itAny consumer (AI, BI, ML, API)
AI agent accessNoSometimes (via plugin)First-class (MCP, REST)
Version controlTool-specificTool-specificYAML in git
Vendor lock-inHighHighLow (open source)
Where SQL is generatedBI toolCentralized engineSidecar (custom AST)

How OrionBelt Implements the Semantic Sidecar

OrionBelt Semantic Layer (OBSL) is an open-source reference implementation of the Semantic Sidecar pattern. It is API-first and consumer-agnostic by design.

OrionBelt Semantic Sidecar full-circle architecture: AI agents and analytics consumers query governed business semantics, OBSL compiles SQL through a custom AST, and Dremio executes against the underlying data sources
Full-circle architecture: AI agents and analytics tools query the Semantic Sidecar in business concepts; OBSL compiles dialect-specific SQL via its custom AST; the query engine (Dremio in this example) executes against the underlying data sources.

The model: declarative YAML (OBML)

You define dimensions, measures, metrics, business rules, joins, and semantic context in .obml.yaml files. These live in git, get reviewed in pull requests, and are versioned alongside the rest of your code. There is no proprietary modeling UI to learn and no SaaS lock-in.

The engine: a custom SQL AST

OBSL compiles your YAML model into an internal SQL Abstract Syntax Tree, then emits dialect-specific SQL for PostgreSQL, Snowflake, BigQuery, ClickHouse, Databricks, DuckDB/MotherDuck, Dremio, and MySQL. Because the AST is custom (not string templating), the output is guaranteed to be syntactically valid and injection-safe.

Fan-trap prevention via CFL

Multi-fact queries spanning independent fact tables produce silent over-counting in naive SQL generation (the classic "fan trap"). OBSL's Composite Fact Layer (CFL) detects multi-fact queries, splits them, runs them independently, and combines via UNION ALL. AI agents and BI tools get correct numbers by construction.

The unified API surface

When to Use a Semantic Sidecar

The Semantic Sidecar pattern is the right call when:

Related Concepts

The Semantic Sidecar pattern intersects with several adjacent ideas:

Frequently Asked Questions

What is a Semantic Sidecar?

A governed semantic layer that runs alongside your existing data platforms instead of inside a BI tool or as a centralized rewrite. It injects business semantics into AI agents, analytics workflows, and data systems through a unified API, with no architecture change.

How does a Semantic Sidecar differ from an embedded semantic layer?

An embedded semantic layer lives inside a single BI tool and only that tool can use it. A Semantic Sidecar is platform-agnostic: the same governed model serves AI agents via MCP, analytics workflows via REST, DB-API, or PostgreSQL Wire Protocol, and reporting via Apache Arrow Flight SQL. One model, many consumers, no vendor lock-in.

Why do AI agents need a Semantic Sidecar?

LLMs hallucinate SQL, pick wrong joins, and produce inconsistent metrics when given raw database schemas. A Semantic Sidecar gives them governed business concepts to query instead. The agent asks for "revenue by region last quarter" and the sidecar compiles deterministic, fan-trap-free SQL against the right tables.

How does OrionBelt implement the Semantic Sidecar pattern?

OBSL compiles declarative YAML models (OBML) into optimized SQL across 8 database dialects via a custom AST. It exposes the same governed semantics through a REST API, MCP server, Gradio UI, DB-API 2.0, Apache Arrow Flight SQL, and a PostgreSQL Wire Protocol endpoint (any psql, JDBC, ODBC, or BI client connects as if to a Postgres database). Agents and analytics tools query in business concepts; OBSL handles SQL generation, fan-trap prevention, freshness-aware caching, and audit logging.

When should I use a Semantic Sidecar instead of a traditional semantic layer?

Choose the Semantic Sidecar pattern when you already have data platforms in place and don't want to rewrite them, when AI agents need governed data access (not raw SQL), when multiple consumers (BI, agents, reports, ML pipelines) need consistent metrics, or when you want analytics defined as version-controlled code.

Try the Semantic Sidecar live

OBSL runs as a hosted demo at orionbelt.ralforion.com with a pre-loaded example model. REST API, MCP server, and Gradio playground are all open and unauthenticated.

Live Demo GitHub Contact RALFORION