Pattern

Governed Text-to-SQL

Why naive text-to-SQL silently produces wrong answers on multi-fact queries, and how ontology-grounded, AST-based generation served through a Semantic Sidecar eliminates fan-traps by construction.

Why Naive Text-to-SQL Fails in Production

Hand an LLM a database schema and a natural-language question, and it will return SQL. The SQL will often run. The result will sometimes be right. In production analytical workloads, three failure modes dominate:

The Fan-Trap, Concretely

Suppose you have two fact tables: orders and support_tickets. Both relate to customers. You ask: "Total order revenue and total support tickets, by customer, last quarter."

Anti-pattern: naive text-to-SQL
SELECT c.customer_id,
       SUM(o.amount)        AS revenue,
       COUNT(t.ticket_id)   AS tickets
FROM   customers   c
JOIN   orders      o ON o.customer_id = c.customer_id
JOIN   support_tickets t ON t.customer_id = c.customer_id
WHERE  o.created_at >= '2026-01-01'
  AND  t.created_at >= '2026-01-01'
GROUP  BY c.customer_id;

This runs cleanly. It returns plausible-looking numbers. Revenue is multiplied by the number of tickets per customer. Tickets is multiplied by the number of orders per customer. The CFO will not laugh when this hits a quarterly report.

Governed pattern: split via Composite Fact Layer
-- OBSL detects multi-fact and emits:
SELECT customer_id, SUM(revenue) AS revenue, MAX(tickets) AS tickets
FROM (
  SELECT customer_id, SUM(amount) AS revenue, NULL AS tickets
  FROM   orders
  WHERE  created_at >= '2026-01-01'
  GROUP  BY customer_id

  UNION ALL

  SELECT customer_id, NULL AS revenue, COUNT(*) AS tickets
  FROM   support_tickets
  WHERE  created_at >= '2026-01-01'
  GROUP  BY customer_id
)
GROUP BY customer_id;

The key insight: fan-trap prevention is a property of the SQL generator, not the LLM. You cannot prompt-engineer your way out of a fan-trap reliably. You need a generator that recognizes multi-fact queries and splits them by construction.

From Schema-Grounded to Ontology-Grounded

The deeper fix is to stop giving the LLM raw schema. Schemas are noisy: column names like fk_user_id_v2, surrogate keys, audit columns, deprecated tables. An ontology, by contrast, models concepts: Customer, Order, Revenue, with declared relationships and constraints.

GraphRAG: discover the right concepts

OrionBelt Analytics generates an RDF/OWL ontology from your database schema and serves it via an MCP server. When the agent asks a question, it queries the ontology graph (up to 12-hop traversal) combined with vector embeddings (ChromaDB) for semantic schema discovery. The agent finds the right concepts, with the right join paths, before any SQL is written.

OBQC: validate before execution

Even with concept-level grounding, you want a deterministic safety net. OBQC (the Ontology-Based Query Checker) validates candidate SQL against the ontology and catches fan-traps, bad joins, and type mismatches before the query runs. Failure is loud and structured, not silent and statistical.

OBSL: compile via a custom SQL AST

Once the concepts are known, OrionBelt Semantic Layer compiles the request through its custom SQL Abstract Syntax Tree, emitting dialect-specific SQL for PostgreSQL, Snowflake, BigQuery, ClickHouse, Databricks, DuckDB/MotherDuck, Dremio, or MySQL. Because the AST is custom (not string templating), the output is guaranteed syntactically valid and injection-safe.

Serving Governed Text-to-SQL Through MCP and Postgres Wire

The transport matters. Exposing governed text-to-SQL through the Model Context Protocol means:

For tools that don't speak MCP, OBSL also exposes a PostgreSQL Wire Protocol endpoint. Any psql, JDBC, or ODBC client (and therefore most BI tools) can connect to OBSL as if it were a Postgres database. The same governed semantic model, the same fan-trap-free SQL generation, but reachable from anything that knows how to talk to Postgres.

Naive vs Governed Text-to-SQL

Property Naive text-to-SQL Governed text-to-SQL (OrionBelt)
GroundingRaw schemaRDF/OWL ontology + semantic model
SQL constructionString generation by LLMCustom SQL AST, dialect-specific emitter
Fan-trap handlingSilent over-countDetected and split via CFL
Multi-table joinsHopefulFollows declared join paths
ValidationRun and prayOBQC validates before execution
DeterminismVaries per callSame input, same output
Injection safetyDepends on promptGuaranteed by AST
AuditQuestion text onlyQuestion + compiled SQL + plan + warnings + timing

Frequently Asked Questions

What is a fan-trap in SQL?

A fan-trap is the SQL anti-pattern where joining two independent fact tables through a shared dimension multiplies rows and produces silent over-counts. Naive text-to-SQL falls into this trap routinely on multi-fact analytical queries.

Why does naive text-to-SQL fail on production analytical queries?

It gives the LLM raw schema and asks it to write SQL. The LLM doesn't see business rules, doesn't know your join paths, and silently fan-traps on multi-fact queries. The output is non-deterministic, prone to hallucinated columns, and unauditable.

How does ontology-grounded text-to-SQL differ?

It replaces "give the LLM the schema" with "give the LLM a semantic graph of the schema". The agent navigates concepts and relationships instead of guessing tables and joins. Result: correct multi-table joins, no fan-traps, queries that respect declared business rules.

How does OrionBelt prevent fan-traps?

OBSL detects multi-fact queries via the Composite Fact Layer (CFL), splits them, runs each independently, and combines via UNION ALL. SQL is built through a custom AST, so output is guaranteed correct and injection-safe. OrionBelt Analytics adds OBQC, a deterministic ontology-based validator that catches fan-traps, bad joins, and type mismatches before the query ever runs.

Why MCP for text-to-SQL?

MCP standardizes how AI clients call remote tools. Exposing governed text-to-SQL via MCP means Claude, ChatGPT, Copilot, Cursor, Windsurf, and other MCP clients all get the same correct behavior. The agent calls a semantic query tool instead of writing freeform SQL.

Try governed text-to-SQL live

Point any MCP client at https://orionbelt.ralforion.com/mcp, or open the Gradio playground in your browser to query a pre-loaded example model.

Live Demo GitHub Contact RALFORION