E EMBAN / Docs

Alerts

Alerts evaluate a widget's query every 60 seconds and fire when the resulting value crosses a threshold, swings by a delta, or stops reporting data entirely. They deliver through email, webhook, or both — and every attempt is recorded in the webhook_deliveries observability table.

Attach to widgets, not raw queries. An alert reuses the Query of an existing widget (dashboard widget or standalone widget from the widget library). The widget's metric, numeric prop, and event name define what is being measured — the alert only adds the condition and delivery.

Condition types

TypeWhen it firesRequired fields
thresholdCurrent value crosses threshold_value using the chosen operator (gt, lt, gte, lte).operator, threshold_value
deltaAbsolute percent change vs. the previous period crosses threshold_value. Useful for spike/dip detection.operator, threshold_value
no_dataThe widget query returns zero rows over the lookback window. Good for catching ingestion outages.

Evaluation model

Step 1 Resolve widget The evaluator looks up the widget by widget_id. If dashboard_id is set, it reads the inline widget from that dashboard's published config. Otherwise it reads a standalone widget from the widget library — again preferring the published snapshot so you never get paged on a draft edit.
Step 2 Execute query The widget's Query runs against ClickHouse with a period derived from lookback_minutes, with no group_by and no granularity — the alert only cares about the scalar value.
Step 3 Apply condition Threshold compares the value to threshold_value. Delta compares the previous-period percentage change. No-data short-circuits on empty result sets. Anything else transitions the alert to ok.
Step 4 Fire or resolve Firing writes an alert_events row (state=firing, value, threshold), flips last_state to firing, and triggers delivery. Resolution flips back to ok. Delivery only runs on the firing transition, not every tick.
Step 5 Cooldown Once an alert fires, it will not re-fire until cooldown_minutes has elapsed since last_fired_at. This suppresses paging storms without silencing the alert entirely.

Creating an alert

Create, update, and delete endpoints require an admin API key or admin-scoped session. List, get, and events endpoints work with any authenticated session.

POST /v1/alerts
Authorization: Bearer YOUR_ADMIN_API_KEY
Content-Type: application/json

{
  "dashboard_id": "dash_abc123",
  "widget_id": "w_api_calls",
  "name": "API error rate too high",
  "condition_type": "threshold",
  "operator": "gt",
  "threshold_value": 50,
  "lookback_minutes": 15,
  "email_to": ["oncall@example.com"],
  "webhook_url": "https://hooks.example.com/emban"
}

For a standalone widget (not attached to a dashboard), omit dashboard_id:

{
  "widget_id": "wgt_standalone_latency",
  "name": "p95 latency spike",
  "condition_type": "delta",
  "operator": "gt",
  "threshold_value": 25,
  "lookback_minutes": 60,
  "webhook_url": "https://hooks.example.com/emban"
}

Webhook payload

Webhooks are POSTed with Content-Type: application/json. The payload identifies the alert, the dashboard/widget, and the exact condition that fired:

{
  "alert_id": 42,
  "name": "API error rate too high",
  "dashboard_id": "dash_abc123",
  "widget_id": "w_api_calls",
  "condition": {
    "type": "threshold",
    "operator": "gt",
    "value": 50
  },
  "state": "firing",
  "fired_at": "2026-04-24T14:30:00Z"
}

Emban retries webhooks up to 3 times on failure with backoff (0s, 30s, 120s). A response code in the 2xx range stops retries; anything else (4xx, 5xx, connection error, 10-second timeout) triggers the next attempt. All attempts — successes and failures — are logged to webhook_deliveries and visible under Admin → Webhooks in the app.

SSRF protection. Webhook URLs pointing at private ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, link-local) are rejected before the request is dispatched. If you need internal delivery, route it through an external relay or a public proxy.

Email delivery

Emails send through the org's configured SMTP relay. Subject is [Emban Alert] <alert name>; the body includes the condition, widget reference, and a link back to the dashboard:

Subject: [Emban Alert] API error rate too high

Alert: API error rate too high
Condition: threshold gt 50.00
Dashboard: dash_abc123
Widget: w_api_calls

View dashboard: https://emban.example.com/app/dashboards/dash_abc123

---
Emban Alert System

Managing alerts

# List alerts
GET /v1/alerts
  → [{"id":42,"name":"...","last_state":"ok",...}, ...]

# Get one
GET /v1/alerts/42

# Pause/resume (does not delete alert_events history)
PATCH /v1/alerts/42
{"enabled": false}

# Adjust threshold without recreating
PATCH /v1/alerts/42
{"threshold_value": 75, "lookback_minutes": 30}

# Full history of firings and resolutions
GET /v1/alerts/42/events
  → [{"fired_at":"...","state":"firing","value":54.2,"threshold":50}, ...]

# Delete alert (cascades alert_events)
DELETE /v1/alerts/42

Design patterns

Pattern 1 Lookback ≥ evaluation window Alerts tick every 60 seconds. A lookback_minutes of 1 means the alert evaluates a single minute of data — small sample, noisy signal. Prefer 5, 15, or 60 so transient spikes don't page you.
Pattern 2 Cooldown matches your response SLO If your response time to a real incident is 15 minutes, set cooldown_minutes to at least that. Shorter cooldowns page the same incident twice; longer ones mute real re-escalations.
Pattern 3 Use delta for rate-of-change, not totals Delta alerts compare the current period to the previous one of equal length. They catch "this dropped 40%" cases that a static threshold cannot. Use threshold for ceilings and floors; use delta for drift.
Pattern 4 Point no_data at ingestion health A no_data alert on a widget that counts events over 10 minutes tells you the pipeline stopped. It is the cheapest possible ingestion-outage page and costs you nothing except the widget.
Pattern 5 Webhook + email belt and braces If your incident flow runs through PagerDuty/Opsgenie/Slack, send the webhook there and also CC a human inbox. Webhook retries cover transient downtime; the email is the fallback when the whole relay is down.

Observability

Every webhook attempt is recorded in the webhook_deliveries Postgres table with org scope, source (alert vs. report), attempt number, HTTP status, response excerpt (first 1 KB), duration, and error. The Admin → Webhooks page aggregates this into a per-alert summary:

Related: see Scheduled Reports for recurring dashboard exports (CSV/PNG/PDF) that share the same email/webhook delivery and webhook_deliveries observability. The API reference lists every alert endpoint and parameter.