Documentation
Observability
Prometheus metrics, OpenTelemetry traces, Datadog, Honeycomb, Tempo, Langfuse, Splunk, and Slack — wire Gateway-LLM into whatever you already run.
3 min read · updated 2026-04-29
Gateway-LLM ships first-class export paths so you can plug it into whatever observability stack you already run. There are two complementary surfaces:
- Prometheus at
GET /metrics— pull-based, label-rich, vendor-agnostic. - Callback exporters — push-based, one event per request, ideal for APM/log/trace backends (Datadog, Honeycomb, Tempo, Langfuse, Splunk, Slack).
Most operators end up using both: Prometheus for live RED dashboards, callbacks for per-request traces and spend records.
1. Prometheus
The gateway publishes a Prometheus scrape endpoint by default. No auth is required — bind it on an internal listener if your environment forbids unauthenticated endpoints, or disable via config:
metrics:
enabled: true
path: /metrics
Exposed series
All series are namespaced under gatewayllm_:
| Metric | Type | Labels | Meaning |
|---|---|---|---|
| gatewayllm_requests_total | counter | model, provider, status | One increment per completed request. |
| gatewayllm_request_duration_seconds | histogram | model, provider | End-to-end latency. |
| gatewayllm_tokens_total | counter | model, provider, kind | Tokens billed by upstream. |
| gatewayllm_cost_usd_total | counter | model, provider | Cumulative USD spend. |
| gatewayllm_cache_hit_total | counter | type (exact/semantic) | Cache hits served. |
| gatewayllm_rate_limit_hit_total | counter | kind (rpm/tpm) | 429s emitted by the rate limiter. |
| gatewayllm_smart_route_decisions_total | counter | bucket, override | SmartRoute classifications. |
| gatewayllm_circuit_breaker_open | gauge | deployment, provider | 1 when a deployment's breaker is open. |
Scraping
Plain Prometheus
scrape_configs:
- job_name: gateway-llm
static_configs:
- targets: ['gatewayllm:8080']
Datadog Agent (OpenMetrics)
init_config:
instances:
- openmetrics_endpoint: http://gatewayllm:8080/metrics
namespace: gatewayllm
metrics:
- 'gatewayllm_*'
Grafana Agent / Mimir
metrics:
configs:
- name: gateway-llm
scrape_configs:
- job_name: gateway-llm
static_configs:
- targets: ['gatewayllm:8080']
metrics_path: /metrics
A pre-built Grafana dashboard JSON ships in docs/dashboards/ — import it and you have RED + spend graphs in five minutes.
2. Callback exporters
Callback exporters fire once per completed request with a structured event you can ship to any APM, log, or trace backend. Configure them in config.yaml:
observability:
exporters:
- type: otlp
endpoint: https://api.honeycomb.io
headers:
x-honeycomb-team: ${HONEYCOMB_API_KEY}
- type: datadog
api_key_env: DD_API_KEY
site: datadoghq.com
- type: langfuse
public_key_env: LANGFUSE_PUBLIC_KEY
secret_key_env: LANGFUSE_SECRET_KEY
- type: slack
webhook_url_env: SLACK_ALERTS_WEBHOOK
filter:
min_cost_usd: 0.10
Built-in exporter types
otlp— OpenTelemetry. Works with Honeycomb, Tempo, Jaeger, New Relic, Grafana Cloud Traces.datadog— Datadog APM (traces) + Logs.langfuse— purpose-built LLM tracing with model versioning.splunk— HTTP Event Collector.slack— alerts when a request crosses a cost threshold.webhook— generic JSON POST. Roll your own.
Multiple exporters can run in parallel; events are fanned out asynchronously so they don't add latency to the request path.
Event shape
{
"request_id": "req_01HZ...",
"ts": "2026-04-29T12:34:56Z",
"key_id": "vk_01HZX...",
"team": "payments",
"model_alias": "auto",
"deployment": "openai/gpt-4o-mini",
"router": {
"strategy": "classifier",
"bucket": "simple",
"score": 0.21,
"decided_in_us": 0.8
},
"prompt_tokens": 312,
"completion_tokens": 81,
"cost_usd": 0.000099,
"duration_ms": 412,
"cache": { "hit": false, "type": null }
}
3. Logs
Application logs are emitted as JSON to stdout (structured logs) and include request IDs you can correlate with the metrics + callbacks. Configure verbosity with log_level:
log_level: info # debug | info | warn | error
In production, ship stdout to your aggregator (Datadog Logs, Splunk, ELK) and you have the third leg of the observability stool.
What's next
- Configuration — full
config.yamlreference. - Virtual API keys — every event includes the
key_idandteamfor slicing. - The observability cheatsheet on GitHub — exhaustive provider-by-provider walkthroughs.
Stuck or want a feature? Email the founders directly at mitshawtechnologies@gmail.com. We answer fast.