API reference

Gateway-LLM HTTP API

Gateway-LLM speaks the OpenAI HTTP API natively. Everything in the OpenAI Chat Completions, Responses, Embeddings, and Moderations APIs works as-is — you only need to know about the gateway-specific surface listed below.

Base URL

https://your-gateway.example.com   # self-hosted
https://api.gateway-llm.com         # hosted (Custom plan)

Authentication

Every non-admin request requires a virtual API key in the Authorization header.

Authorization: Bearer gw_virt_a8f2...e7c1

Admin endpoints (under /admin/*) require the master key from your config.yaml.

OpenAI-compatible endpoints

These work as drop-ins — the OpenAI SDK does it for you.

EndpointPurpose
POST /v1/chat/completionsChat / instruction-following
POST /v1/responsesResponses API (bridged for non-OpenAI providers)
POST /v1/completionsLegacy completions
POST /v1/embeddingsEmbedding vectors
POST /v1/moderationsOpenAI moderations
POST /v1/images/generationsImage generation (where supported)
POST /v1/audio/transcriptionsAudio (where supported)

Gateway-specific response headers

Every successful request returns these custom headers so you can see what the router did without parsing the body.

HeaderExampleMeaning
X-Gateway-Decisionopenai/gpt-4o-miniProvider/model that served the call.
X-Gateway-Route-BucketsimpleSmart-router bucket for the prompt.
X-Gateway-Route-Score0.21Classifier score (0–1).
X-Gateway-Cost-Usd0.000099Computed cost of this request.
X-Gateway-Retries0How many failover retries it took.

Admin endpoints

All require the master key as Authorization.

Virtual keys

EndpointAction
POST /admin/keysCreate a virtual key.
GET /admin/keysList virtual keys.
GET /admin/keys/:idInspect a key (without the secret).
PATCH /admin/keys/:idUpdate limits, models, budget.
DELETE /admin/keys/:idRevoke immediately.

Deployments and routing

EndpointAction
GET /admin/deploymentsList configured deployments.
POST /admin/deployments/disableTake a deployment out of rotation.
POST /admin/deployments/enablePut a deployment back in rotation.
GET /admin/spendPer-key, per-team, per-model spend rollups.

Error format

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-Gateway-Limit-Kind: rpm

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Key vk_01HZ... exceeded its 120 req/min limit.",
    "kind": "rpm",
    "retry_after_seconds": 12
  }
}

OpenAPI spec

The full machine-readable schema lives in the OSS repo — point your generator at it for client SDKs.

Need a deeper walkthrough?

The Quickstart and the language SDK pages walk through real request flows end-to-end.