The Problem with Giving AI Agents Raw Data
When you point an AI agent at a raw database schema and ask it to write SQL, three things go wrong, usually all at once:
- Hallucinated columns and joins. The agent invents column names that don't exist, or joins tables on keys that have no referential meaning.
- Inconsistent metrics. "Revenue" is computed differently each time, depending on which CTE the agent decided to write today.
- Fan-trap over-counts. When the query spans two independent fact tables joined to a shared dimension, naive SQL multiplies the rows. The number that comes back looks plausible. It is wrong.
Production-grade Agentic AI cannot ship on this foundation. You need governed AI data access: a layer between the agent and the database that enforces the right definitions, the right joins, and the right grain.
What "Governed AI Data Access" Means
Governed AI data access is the property that any agent, asking for any business concept, gets the same correct answer every time. It has four components:
- Semantics: business concepts (dimensions, measures, metrics) are defined once, in version control, and the agent queries them by name instead of inventing them from the schema.
- Deterministic SQL generation: the same agent question produces the same SQL. No CTE roulette, no string templating, no injection vectors.
- Correctness guarantees: fan traps are detected and split. Joins follow declared paths. Grain is preserved.
- Audit: every query is logged with the compiled SQL, the query plan, the warnings, and the timing.
Why this matters now: as agents take more autonomous actions on behalf of users, "the LLM wrote the SQL" is no longer an acceptable answer in a postmortem. The agent's data access has to be as reproducible as the rest of your stack.
Why MCP is the Right Transport
The Model Context Protocol (MCP) is an open protocol that lets AI clients discover and call tools on remote servers. For data access, MCP gives you several wins that ad-hoc integrations don't:
- Client portability: Claude, ChatGPT, Copilot, Cursor, Windsurf, and others all speak MCP. The same data access server works across all of them.
- Tool descriptions: the MCP server describes its tools in a machine-readable way, so the agent picks the right tool for the right question instead of free-styling SQL.
- Streamable HTTP: stateless, scalable, deployable behind any reverse proxy. No bespoke websocket plumbing.
- Sampling: the server can call back into the agent for LLM completions when needed (e.g. for natural-language metric naming).
How OrionBelt Provides Agentic AI Data Access
OrionBelt Semantic Layer (OBSL) exposes a governed semantic model via an MCP server. The flow looks like this:
- You define your business model in declarative YAML (OBML): dimensions, measures, metrics, joins, business rules.
- The model gets loaded by OBSL on startup; model health is reported via the agent-facing API.
- An agent connects to
/mcpvia Streamable HTTP and discovers the available tools. - The agent asks for a business concept (e.g. "monthly active users by region for Q1").
- OBSL compiles deterministic SQL via its custom AST, runs it against the source database, and returns the result with the compiled SQL, query plan, and any warning codes.
- The result is cached with freshness awareness, so a follow-up question hits the cache until the underlying data changes.
Complementary tools in the OrionBelt platform
- OrionBelt Analytics: an MCP server that auto-generates RDF/OWL ontologies from database schemas, with GraphRAG for semantic schema discovery (12-hop graph traversal + ChromaDB vector embeddings) and OBQC for deterministic SQL validation (catches fan-traps, bad joins, type mismatches before execution).
- OrionBelt Chat: a conversational AI interface that connects to both MCP servers (Analytics + Semantic Layer), with support for 300+ models via OpenRouter and local LLMs via MLX/Ollama.
- OrionBelt Runner: for the cases where the access isn't interactive, runs YAML-defined query batches against the Semantic Layer and emits markdown, HTML, or PDF reports with audit-grade YAML run-logs.
Patterns vs Anti-Patterns
| Anti-pattern | Governed pattern |
|---|---|
| Give the agent the raw schema and let it write SQL. | Give the agent governed tools that accept business concepts and compile SQL deterministically. |
| Define "revenue" once per agent prompt. | Define "revenue" once in the semantic model; every agent uses the same definition. |
| Hope the LLM doesn't fan-trap. | Detect multi-fact queries and split them via Composite Fact Layer. |
| Log only the agent's natural-language question. | Log the question, the compiled SQL, the query plan, warnings, and timing. |
| Bespoke integration per AI client. | One MCP server, every MCP-capable client. |
Frequently Asked Questions
What is Agentic AI data access?
The set of patterns, protocols, and guardrails that let autonomous AI agents query enterprise data correctly and safely. It goes beyond text-to-SQL: it includes governed semantics, fan-trap prevention, schema discovery, audit trails, and access through standard protocols like MCP.
Why is raw text-to-SQL insufficient for AI agents?
Raw text-to-SQL gives the LLM your database schema and hopes it picks the right tables, joins, and filters. In practice it hallucinates columns, ignores business rules, produces fan-trap over-counts on multi-fact queries, and gives different answers on different runs.
What is MCP (Model Context Protocol) and why does it matter for data access?
MCP is an open protocol from Anthropic that lets AI clients discover and call tools on remote servers. For data access, an MCP server can expose governed query tools, schema discovery, and metric calculation. The agent sees stable, semantically-rich tools instead of a raw database connection.
How does OrionBelt provide governed Agentic AI data access?
OBSL exposes a governed semantic model via an MCP server. AI agents query in business concepts and OBSL compiles deterministic, fan-trap-free SQL across 8 database dialects. OrionBelt Analytics complements this with GraphRAG for ontology-based schema discovery and OBQC for SQL validation.
What about audit and compliance?
Every query through OBSL is logged with the compiled SQL, query plan, warnings, and timing. The semantic model itself lives in version control. Combined with OrionBelt Runner's YAML run-log sidecar, you get a full audit trail of who asked what, what SQL ran, and what the result was.