SQL Dialects¶
OrionBelt compiles semantic queries into SQL for five database dialects. Each dialect has its own identifier quoting, function names, and SQL syntax. The plugin architecture allows adding new dialects without modifying the core compiler.
Supported Dialects¶
| Dialect | Identifier | Description |
|---|---|---|
| PostgreSQL | postgres |
Standard PostgreSQL with strict GROUP BY |
| Snowflake | snowflake |
Cloud data warehouse with QUALIFY, semi-structured types |
| ClickHouse | clickhouse |
Column-oriented OLAP with custom date/aggregation functions |
| Dremio | dremio |
Data lakehouse with reduced function surface |
| Databricks SQL | databricks |
Spark SQL semantics with backtick identifiers |
Capabilities Matrix¶
Each dialect declares capability flags that the compiler uses to choose SQL generation strategies.
| Capability | Postgres | Snowflake | ClickHouse | Dremio | Databricks |
|---|---|---|---|---|---|
supports_cte |
Yes | Yes | Yes | Yes | Yes |
supports_qualify |
No | Yes | No | No | No |
supports_arrays |
Yes | Yes | Yes | No | Yes |
supports_window_filters |
No | Yes | No | No | No |
supports_ilike |
Yes | Yes | Yes | No | No |
supports_time_travel |
No | Yes | No | No | No |
supports_semi_structured |
No | Yes | No | No | No |
Identifier Quoting¶
| Dialect | Style | Example |
|---|---|---|
| Postgres | Double quotes | "column_name" |
| Snowflake | Double quotes | "column_name" |
| ClickHouse | Double quotes | "column_name" |
| Dremio | Double quotes | "column_name" |
| Databricks | Backticks | `column_name` |
Time Grain Functions¶
The timeGrain is rendered differently per dialect:
String Contains¶
The contains filter operator is rendered per dialect:
CAST Handling¶
Aggregation Functions¶
Most aggregations (SUM, COUNT, AVG, MIN, MAX) compile identically across dialects. The following aggregations require dialect-specific rendering:
ANY_VALUE¶
| Dialect | SQL |
|---|---|
| Postgres | ANY_VALUE(col) |
| Snowflake | ANY_VALUE(col) |
| ClickHouse | any(col) |
| Dremio | ANY_VALUE(col) |
| Databricks | ANY_VALUE(col) |
MEDIAN¶
| Dialect | SQL |
|---|---|
| Postgres | PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY col) |
| Snowflake | MEDIAN(col) |
| ClickHouse | MEDIAN(col) |
| Dremio | MEDIAN(col) |
| Databricks | MEDIAN(col) |
MODE¶
| Dialect | SQL |
|---|---|
| Postgres | MODE() WITHIN GROUP (ORDER BY col) |
| Snowflake | MODE(col) |
| ClickHouse | topK(1)(col)[1] |
| Dremio | Not supported |
| Databricks | MODE(col) |
LISTAGG¶
| Dialect | Base | + DISTINCT | + ORDER BY |
|---|---|---|---|
| Postgres | STRING_AGG(col, sep) |
STRING_AGG(DISTINCT col, sep) |
STRING_AGG(col, sep ORDER BY col) |
| Snowflake | LISTAGG(col, sep) |
LISTAGG(DISTINCT col, sep) |
LISTAGG(col, sep) WITHIN GROUP (ORDER BY col) |
| ClickHouse | arrayStringConcat(groupArray(col), sep) |
arrayStringConcat(groupUniqArray(col), sep) |
arrayStringConcat(arraySort(groupArray(col)), sep) |
| Dremio | LISTAGG(col, sep) |
LISTAGG(DISTINCT col, sep) |
LISTAGG(col, sep) WITHIN GROUP (ORDER BY col) |
| Databricks | ARRAY_JOIN(COLLECT_LIST(col), sep) |
ARRAY_JOIN(COLLECT_SET(col), sep) |
ARRAY_JOIN(SORT_ARRAY(COLLECT_LIST(col)), sep) |
LISTAGG ordering limitations
ClickHouse and Databricks only support self-ordering (sorting by the aggregated column). Ordering by a different column raises an error at compile time.
Total not supported
MEDIAN, MODE, LISTAGG, and ANY_VALUE do not support total: true because they cannot be meaningfully re-aggregated via window functions.
Dialect Plugin Architecture¶
Each dialect implements the abstract Dialect base class:
class Dialect(ABC):
@property
@abstractmethod
def name(self) -> str: ...
@property
@abstractmethod
def capabilities(self) -> DialectCapabilities: ...
@abstractmethod
def quote_identifier(self, name: str) -> str: ...
@abstractmethod
def render_time_grain(self, column: Expr, grain: TimeGrain) -> Expr: ...
@abstractmethod
def render_cast(self, expr: Expr, target_type: str) -> Expr: ...
def render_string_contains(self, column: Expr, pattern: Expr) -> Expr: ...
def compile(self, ast: Select) -> str: ...
Dialects register themselves via the @DialectRegistry.register decorator:
@DialectRegistry.register
class PostgresDialect(Dialect):
@property
def name(self) -> str:
return "postgres"
...
The registry provides lookup by name:
from orionbelt.dialect.registry import DialectRegistry
dialect = DialectRegistry.get("snowflake")
sql = dialect.compile(ast)
Adding a New Dialect¶
- Create
src/orionbelt/dialect/my_dialect.py - Subclass
Dialectand implement all abstract methods - Decorate with
@DialectRegistry.register - The dialect is automatically available via
DialectRegistry.get("my_dialect")