E EMBAN / Docs

Query Builder

The Query Builder is the admin-side adhoc SQL tool at /app/querybuilder. It talks directly to the ClickHouse cluster that stores your org's events, with org + environment scoping enforced at the engine level so you can't accidentally (or deliberately) read another tenant's data.

Admin-only. The builder runs raw ClickHouse SELECT queries. Access requires an admin-scoped session. Viewers and editors see the page in the sidebar but get a 403 when they try to run a query.

What it's for

Three editing modes

Visual Point-and-click builder Pick a dataset, measures, dimensions, filters, and grain. The page generates the SQL for you and shows it in the preview pane — useful for learning the event schema.
SQL Full CodeMirror editor Straight-up SQL with schema-aware autocomplete (columns for the open dataset + SQL keywords), a Format button powered by sql-formatter, and a side panel showing the full dataset catalog.
Split Both at once Visual controls on the left, live SQL on the right. Tweaking a dimension updates the SQL in place — good for learning ClickHouse idioms by watching the generator handle edge cases.

The schema browser

The left rail lists every table and materialized view your org can read, with its columns and types. The list comes from GET /v1/datasets. Clicking a dataset expands its columns; clicking a column inserts it into the visual builder's dimension or measure field depending on the active tab.

Long type annotations are abbreviated in the list — DateTime64(3, 'UTC') shows as DateTime64, LowCardinality(String) shows as Lc(String), Map(String, Float64) shows as Map. Hover over the column row to see the full type in the tooltip. Use Datasets → View DDL (or GET /v1/datasets/{name}/ddl) when you need the exact CREATE TABLE statement.

Running a query

Hit the Run button (or the keyboard shortcut shown in the toolbar) to execute. The response tab below shows:

The result pane also offers Copy CSV and Copy JSON for pasting into a spreadsheet or notebook. History (per-browser, last 30 runs) lives under the History tab on the left rail and is kept in localStorage.

Limits

LimitValueWhy
SQL length8192 charactersEnforced before the query reaches ClickHouse. Rejects with 400 sql too long.
Execution time5 secondsClickHouse max_execution_time=5. Expensive queries get killed at the engine; you see a readable error back.
Result rows10,000ClickHouse max_result_rows=10000 with result_overflow_mode=break. Anything beyond is silently dropped and the truncated stat flips to true.
Statement typeSELECT or WITH onlyWrites are blocked at two layers: a keyword pre-check and readonly=2 on the ClickHouse session.

Security model

Every adhoc query runs with a ClickHouse additional_table_filters setting that pins emban.events to the caller's org_id and environment. The filter is injected at SETTINGS level, not concatenated into the SQL, so it:

Writes are blocked by readonly=2 (all mutations forbidden, SETTINGS overrides still allowed for our filter injection). A query that tries to INSERT, ALTER, DROP, TRUNCATE, or OPTIMIZE is rejected with 400 only SELECT queries are allowed before it ever reaches ClickHouse.

What this means for you. If you're debugging a widget that seems to be missing data, you don't need to add WHERE org_id = '...' to your SQL — the engine adds it for you. Cross-checking a widget's numbers against raw SQL gives you apples-to-apples results without manual scoping.

Saved queries

Saved queries live in Postgres (saved_queries table), scoped to the org. Anyone in the org can list and load them; only admins can create, update, delete, or star. Each saved query records name, sql, dataset (for UX context), starred, created_by, and updated-at.

# List (auth required)
GET /v1/saved-queries
  → [{"id":12,"name":"top tenants","sql":"SELECT...","starred":true,...}]

# Create (admin only)
POST /v1/saved-queries
{
  "name": "Top tenants by volume (7d)",
  "sql": "SELECT tenant_id, count() FROM emban.events WHERE timestamp > now() - INTERVAL 7 DAY GROUP BY tenant_id ORDER BY 2 DESC LIMIT 10",
  "dataset": "emban.events"
}

# Toggle star or rename
PATCH /v1/saved-queries/12
{"starred": true}

# Remove
DELETE /v1/saved-queries/12

Starred queries float to the top of the sidebar list for everyone in the org. Use this for the 3–5 queries your team actually reaches for — health checks, pipeline sanity, common customer questions.

Audit trail

Every successful adhoc run writes a query.adhoc.run row into audit_log:

{
  "action": "query.adhoc.run",
  "actor": "admin@example.com",
  "org_id": "org_abc",
  "metadata": {
    "sql_excerpt": "SELECT tenant_id, count() FROM emban.events ...",
    "rows": 847,
    "wall_ms": 218,
    "truncated": false
  },
  "created_at": "2026-04-24T14:30:00Z"
}

The sql_excerpt is the first 512 characters of the submitted SQL — enough to reconstruct “who ran what” without bloating the audit table. Admins can browse this under Admin → Audit and filter by actor or time range.

Design patterns

Pattern 1 Use Visual mode to learn, SQL mode to ship Visual is the fastest way to discover which columns exist and how to aggregate them. Once you have the shape you want, copy the generated SQL into SQL mode and tighten it (explicit column lists, better aliases, pre-filtered subqueries).
Pattern 2 Save health checks, not ad-hoc questions “Events ingested in the last hour by source” is the kind of query your team will open ten times a week. Save and star it. “Why was customer X slow on Tuesday” is a one-off — leave it in history.
Pattern 3 Cross-check widgets before publishing Before publishing a new dashboard, run the widget's underlying aggregation in the Query Builder and compare the numbers. If they differ, you have a silent filter mismatch — almost always in tenant_id, environment, or a string_props key.
Pattern 4 Time-bound every query Unbounded queries scan the whole events table and will hit the 5-second timeout on a busy org. Always include WHERE timestamp > now() - INTERVAL N DAY. The event-time column is also the main sort key, so this is the cheapest possible filter.
Pattern 5 Don't hand-write the scope filter You don't need AND org_id = '...' or AND environment = '...' — the engine adds both for you. Adding them yourself is harmless (the filters compound), but your SQL reads cleaner without them.

API reference

# Execute (admin only)
POST /v1/query/adhoc
Authorization: Bearer YOUR_ADMIN_API_KEY
Content-Type: application/json

{"sql": "SELECT tenant_id, count() FROM emban.events WHERE timestamp > now() - INTERVAL 1 DAY GROUP BY tenant_id"}

# Response
{
  "columns": [
    {"name": "tenant_id", "type": "String"},
    {"name": "count()", "type": "UInt64"}
  ],
  "rows": [
    ["tenant_acme", 12453],
    ["tenant_foo", 8412]
  ],
  "stats": {"wall_ms": 143, "rows_returned": 2, "truncated": false}
}

# Browse schema (any authenticated user)
GET /v1/datasets
GET /v1/datasets/{name}/ddl

# Saved queries
GET    /v1/saved-queries                   (auth)
POST   /v1/saved-queries                   (admin)
PATCH  /v1/saved-queries/{id}              (admin)
DELETE /v1/saved-queries/{id}              (admin)
Related: see Events for the underlying schema (emban.events columns, prop shapes), Dashboards for how widgets are built on top of this same data, and the API reference for every endpoint surface.