API reference
Gateway-LLM HTTP API
Gateway-LLM speaks the OpenAI HTTP API natively. Everything in the OpenAI Chat Completions, Responses, Embeddings, and Moderations APIs works as-is — you only need to know about the gateway-specific surface listed below.
Base URL
https://your-gateway.example.com # self-hosted
https://api.gateway-llm.com # hosted (Custom plan)Authentication
Every non-admin request requires a virtual API key in the Authorization header.
Authorization: Bearer gw_virt_a8f2...e7c1Admin endpoints (under /admin/*) require the master key from your config.yaml.
OpenAI-compatible endpoints
These work as drop-ins — the OpenAI SDK does it for you.
| Endpoint | Purpose |
|---|---|
POST /v1/chat/completions | Chat / instruction-following |
POST /v1/responses | Responses API (bridged for non-OpenAI providers) |
POST /v1/completions | Legacy completions |
POST /v1/embeddings | Embedding vectors |
POST /v1/moderations | OpenAI moderations |
POST /v1/images/generations | Image generation (where supported) |
POST /v1/audio/transcriptions | Audio (where supported) |
Gateway-specific response headers
Every successful request returns these custom headers so you can see what the router did without parsing the body.
| Header | Example | Meaning |
|---|---|---|
X-Gateway-Decision | openai/gpt-4o-mini | Provider/model that served the call. |
X-Gateway-Route-Bucket | simple | Smart-router bucket for the prompt. |
X-Gateway-Route-Score | 0.21 | Classifier score (0–1). |
X-Gateway-Cost-Usd | 0.000099 | Computed cost of this request. |
X-Gateway-Retries | 0 | How many failover retries it took. |
Admin endpoints
All require the master key as Authorization.
Virtual keys
| Endpoint | Action |
|---|---|
POST /admin/keys | Create a virtual key. |
GET /admin/keys | List virtual keys. |
GET /admin/keys/:id | Inspect a key (without the secret). |
PATCH /admin/keys/:id | Update limits, models, budget. |
DELETE /admin/keys/:id | Revoke immediately. |
Deployments and routing
| Endpoint | Action |
|---|---|
GET /admin/deployments | List configured deployments. |
POST /admin/deployments/disable | Take a deployment out of rotation. |
POST /admin/deployments/enable | Put a deployment back in rotation. |
GET /admin/spend | Per-key, per-team, per-model spend rollups. |
Error format
HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-Gateway-Limit-Kind: rpm
{
"error": {
"type": "rate_limit_exceeded",
"message": "Key vk_01HZ... exceeded its 120 req/min limit.",
"kind": "rpm",
"retry_after_seconds": 12
}
}OpenAPI spec
The full machine-readable schema lives in the OSS repo — point your generator at it for client SDKs.
Need a deeper walkthrough?
The Quickstart and the language SDK pages walk through real request flows end-to-end.