Documentation

Observability

Prometheus metrics, OpenTelemetry traces, Datadog, Honeycomb, Tempo, Langfuse, Splunk, and Slack — wire Gateway-LLM into whatever you already run.

3 min read · updated 2026-04-29

Gateway-LLM ships first-class export paths so you can plug it into whatever observability stack you already run. There are two complementary surfaces:

  1. Prometheus at GET /metrics — pull-based, label-rich, vendor-agnostic.
  2. Callback exporters — push-based, one event per request, ideal for APM/log/trace backends (Datadog, Honeycomb, Tempo, Langfuse, Splunk, Slack).

Most operators end up using both: Prometheus for live RED dashboards, callbacks for per-request traces and spend records.

1. Prometheus

The gateway publishes a Prometheus scrape endpoint by default. No auth is required — bind it on an internal listener if your environment forbids unauthenticated endpoints, or disable via config:

metrics:
  enabled: true
  path: /metrics

Exposed series

All series are namespaced under gatewayllm_:

| Metric | Type | Labels | Meaning | |---|---|---|---| | gatewayllm_requests_total | counter | model, provider, status | One increment per completed request. | | gatewayllm_request_duration_seconds | histogram | model, provider | End-to-end latency. | | gatewayllm_tokens_total | counter | model, provider, kind | Tokens billed by upstream. | | gatewayllm_cost_usd_total | counter | model, provider | Cumulative USD spend. | | gatewayllm_cache_hit_total | counter | type (exact/semantic) | Cache hits served. | | gatewayllm_rate_limit_hit_total | counter | kind (rpm/tpm) | 429s emitted by the rate limiter. | | gatewayllm_smart_route_decisions_total | counter | bucket, override | SmartRoute classifications. | | gatewayllm_circuit_breaker_open | gauge | deployment, provider | 1 when a deployment's breaker is open. |

Scraping

Plain Prometheus

scrape_configs:
  - job_name: gateway-llm
    static_configs:
      - targets: ['gatewayllm:8080']

Datadog Agent (OpenMetrics)

init_config:
instances:
  - openmetrics_endpoint: http://gatewayllm:8080/metrics
    namespace: gatewayllm
    metrics:
      - 'gatewayllm_*'

Grafana Agent / Mimir

metrics:
  configs:
    - name: gateway-llm
      scrape_configs:
        - job_name: gateway-llm
          static_configs:
            - targets: ['gatewayllm:8080']
          metrics_path: /metrics

A pre-built Grafana dashboard JSON ships in docs/dashboards/ — import it and you have RED + spend graphs in five minutes.

2. Callback exporters

Callback exporters fire once per completed request with a structured event you can ship to any APM, log, or trace backend. Configure them in config.yaml:

observability:
  exporters:
    - type: otlp
      endpoint: https://api.honeycomb.io
      headers:
        x-honeycomb-team: ${HONEYCOMB_API_KEY}
    - type: datadog
      api_key_env: DD_API_KEY
      site: datadoghq.com
    - type: langfuse
      public_key_env: LANGFUSE_PUBLIC_KEY
      secret_key_env: LANGFUSE_SECRET_KEY
    - type: slack
      webhook_url_env: SLACK_ALERTS_WEBHOOK
      filter:
        min_cost_usd: 0.10

Built-in exporter types

  • otlp — OpenTelemetry. Works with Honeycomb, Tempo, Jaeger, New Relic, Grafana Cloud Traces.
  • datadog — Datadog APM (traces) + Logs.
  • langfuse — purpose-built LLM tracing with model versioning.
  • splunk — HTTP Event Collector.
  • slack — alerts when a request crosses a cost threshold.
  • webhook — generic JSON POST. Roll your own.

Multiple exporters can run in parallel; events are fanned out asynchronously so they don't add latency to the request path.

Event shape

{
  "request_id": "req_01HZ...",
  "ts": "2026-04-29T12:34:56Z",
  "key_id": "vk_01HZX...",
  "team": "payments",
  "model_alias": "auto",
  "deployment": "openai/gpt-4o-mini",
  "router": {
    "strategy": "classifier",
    "bucket": "simple",
    "score": 0.21,
    "decided_in_us": 0.8
  },
  "prompt_tokens": 312,
  "completion_tokens": 81,
  "cost_usd": 0.000099,
  "duration_ms": 412,
  "cache": { "hit": false, "type": null }
}

3. Logs

Application logs are emitted as JSON to stdout (structured logs) and include request IDs you can correlate with the metrics + callbacks. Configure verbosity with log_level:

log_level: info  # debug | info | warn | error

In production, ship stdout to your aggregator (Datadog Logs, Splunk, ELK) and you have the third leg of the observability stool.

What's next


Stuck or want a feature? Email the founders directly at mitshawtechnologies@gmail.com. We answer fast.