SQL Dialects¶

OrionBelt compiles semantic queries into SQL for five database dialects. Each dialect has its own identifier quoting, function names, and SQL syntax. The plugin architecture allows adding new dialects without modifying the core compiler.

Supported Dialects¶

Dialect	Identifier	Description
PostgreSQL	`postgres`	Standard PostgreSQL with strict GROUP BY
Snowflake	`snowflake`	Cloud data warehouse with QUALIFY, semi-structured types
ClickHouse	`clickhouse`	Column-oriented OLAP with custom date/aggregation functions
Dremio	`dremio`	Data lakehouse with reduced function surface
Databricks SQL	`databricks`	Spark SQL semantics with backtick identifiers

Capabilities Matrix¶

Each dialect declares capability flags that the compiler uses to choose SQL generation strategies.

Capability	Postgres	Snowflake	ClickHouse	Dremio	Databricks
`supports_cte`	Yes	Yes	Yes	Yes	Yes
`supports_qualify`	No	Yes	No	No	No
`supports_arrays`	Yes	Yes	Yes	No	Yes
`supports_window_filters`	No	Yes	No	No	No
`supports_ilike`	Yes	Yes	Yes	No	No
`supports_time_travel`	No	Yes	No	No	No
`supports_semi_structured`	No	Yes	No	No	No

Identifier Quoting¶

Dialect	Style	Example
Postgres	Double quotes	`"column_name"`
Snowflake	Double quotes	`"column_name"`
ClickHouse	Double quotes	`"column_name"`
Dremio	Double quotes	`"column_name"`
Databricks	Backticks	`column_name`

Time Grain Functions¶

The timeGrain is rendered differently per dialect:

PostgresSnowflakeClickHouseDremioDatabricks

date_trunc('month', "order_date")
date_trunc('year', "order_date")
date_trunc('quarter', "order_date")

DATE_TRUNC('month', "order_date")
DATE_TRUNC('year', "order_date")
DATE_TRUNC('quarter', "order_date")

toStartOfMonth("order_date")
toStartOfYear("order_date")
toStartOfQuarter("order_date")
toMonday("order_date")        -- week
toDate("order_date")          -- day
toStartOfHour("order_date")
toStartOfMinute("order_date")
toStartOfSecond("order_date")

DATE_TRUNC('month', "order_date")
DATE_TRUNC('year', "order_date")

date_trunc('month', `order_date`)
date_trunc('year', `order_date`)

String Contains¶

The contains filter operator is rendered per dialect:

PostgresSnowflakeClickHouseDremioDatabricks

"column" ILIKE '%' || 'search' || '%'

CONTAINS("column", 'search')

"column" ILIKE '%' || 'search' || '%'

LOWER("column") LIKE '%' || LOWER('search') || '%'

lower(`column`) LIKE '%' || lower('search') || '%'

CAST Handling¶

Postgres / Snowflake / Dremio / DatabricksClickHouse

CAST(expr AS INTEGER)
CAST(expr AS VARCHAR)
CAST(expr AS DATE)

ClickHouse uses native conversion functions:

toInt64(expr)      -- int / integer
toFloat64(expr)    -- float / double
toString(expr)     -- string / varchar
toDate(expr)       -- date
-- Other types fall back to CAST
CAST(expr AS DateTime)

Aggregation Functions¶

Most aggregations (SUM, COUNT, AVG, MIN, MAX) compile identically across dialects. The following aggregations require dialect-specific rendering:

ANY_VALUE¶

Dialect	SQL
Postgres	`ANY_VALUE(col)`
Snowflake	`ANY_VALUE(col)`
ClickHouse	`any(col)`
Dremio	`ANY_VALUE(col)`
Databricks	`ANY_VALUE(col)`

MEDIAN¶

Dialect	SQL
Postgres	`PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY col)`
Snowflake	`MEDIAN(col)`
ClickHouse	`MEDIAN(col)`
Dremio	`MEDIAN(col)`
Databricks	`MEDIAN(col)`

MODE¶

Dialect	SQL
Postgres	`MODE() WITHIN GROUP (ORDER BY col)`
Snowflake	`MODE(col)`
ClickHouse	`topK(1)(col)[1]`
Dremio	Not supported
Databricks	`MODE(col)`

LISTAGG¶

Dialect	Base	+ DISTINCT	+ ORDER BY
Postgres	`STRING_AGG(col, sep)`	`STRING_AGG(DISTINCT col, sep)`	`STRING_AGG(col, sep ORDER BY col)`
Snowflake	`LISTAGG(col, sep)`	`LISTAGG(DISTINCT col, sep)`	`LISTAGG(col, sep) WITHIN GROUP (ORDER BY col)`
ClickHouse	`arrayStringConcat(groupArray(col), sep)`	`arrayStringConcat(groupUniqArray(col), sep)`	`arrayStringConcat(arraySort(groupArray(col)), sep)`
Dremio	`LISTAGG(col, sep)`	`LISTAGG(DISTINCT col, sep)`	`LISTAGG(col, sep) WITHIN GROUP (ORDER BY col)`
Databricks	`ARRAY_JOIN(COLLECT_LIST(col), sep)`	`ARRAY_JOIN(COLLECT_SET(col), sep)`	`ARRAY_JOIN(SORT_ARRAY(COLLECT_LIST(col)), sep)`

LISTAGG ordering limitations

ClickHouse and Databricks only support self-ordering (sorting by the aggregated column). Ordering by a different column raises an error at compile time.

Total not supported

MEDIAN, MODE, LISTAGG, and ANY_VALUE do not support total: true because they cannot be meaningfully re-aggregated via window functions.

Dialect Plugin Architecture¶

Each dialect implements the abstract Dialect base class:

class Dialect(ABC):
    @property
    @abstractmethod
    def name(self) -> str: ...

    @property
    @abstractmethod
    def capabilities(self) -> DialectCapabilities: ...

    @abstractmethod
    def quote_identifier(self, name: str) -> str: ...

    @abstractmethod
    def render_time_grain(self, column: Expr, grain: TimeGrain) -> Expr: ...

    @abstractmethod
    def render_cast(self, expr: Expr, target_type: str) -> Expr: ...

    def render_string_contains(self, column: Expr, pattern: Expr) -> Expr: ...

    def compile(self, ast: Select) -> str: ...

Dialects register themselves via the @DialectRegistry.register decorator:

@DialectRegistry.register
class PostgresDialect(Dialect):
    @property
    def name(self) -> str:
        return "postgres"
    ...

The registry provides lookup by name:

from orionbelt.dialect.registry import DialectRegistry

dialect = DialectRegistry.get("snowflake")
sql = dialect.compile(ast)

Adding a New Dialect¶

Create src/orionbelt/dialect/my_dialect.py
Subclass Dialect and implement all abstract methods
Decorate with @DialectRegistry.register
The dialect is automatically available via DialectRegistry.get("my_dialect")

Querying Dialect Info via API¶

curl http://127.0.0.1:8000/dialects

{
  "dialects": [
    {
      "name": "postgres",
      "capabilities": {
        "supports_cte": true,
        "supports_qualify": false,
        "supports_arrays": true,
        "supports_window_filters": false,
        "supports_ilike": true,
        "supports_time_travel": false,
        "supports_semi_structured": false
      }
    },
    ...
  ]
}