API reference

Gateway-LLM HTTP API

Gateway-LLM speaks the OpenAI HTTP API natively. Everything in the OpenAI Chat Completions, Responses, Embeddings, and Moderations APIs works as-is — you only need to know about the gateway-specific surface listed below.

Base URL

https://your-gateway.example.com   # self-hosted
https://api.gateway-llm.com         # hosted (Custom plan)

Authentication

Every non-admin request requires a virtual API key in the Authorization header.

Authorization: Bearer gw_virt_a8f2...e7c1

Admin endpoints (under /admin/*) require the master key from your config.yaml.

OpenAI-compatible endpoints

These work as drop-ins — the OpenAI SDK does it for you.

Endpoint	Purpose
`POST /v1/chat/completions`	Chat / instruction-following
`POST /v1/responses`	Responses API (bridged for non-OpenAI providers)
`POST /v1/completions`	Legacy completions
`POST /v1/embeddings`	Embedding vectors
`POST /v1/moderations`	OpenAI moderations
`POST /v1/images/generations`	Image generation (where supported)
`POST /v1/audio/transcriptions`	Audio (where supported)

Gateway-specific response headers

Every successful request returns these custom headers so you can see what the router did without parsing the body.

Header	Example	Meaning
`X-Gateway-Decision`	`openai/gpt-4o-mini`	Provider/model that served the call.
`X-Gateway-Route-Bucket`	`simple`	Smart-router bucket for the prompt.
`X-Gateway-Route-Score`	`0.21`	Classifier score (0–1).
`X-Gateway-Cost-Usd`	`0.000099`	Computed cost of this request.
`X-Gateway-Retries`	`0`	How many failover retries it took.

Admin endpoints

All require the master key as Authorization.

Virtual keys

Endpoint	Action
`POST /admin/keys`	Create a virtual key.
`GET /admin/keys`	List virtual keys.
`GET /admin/keys/:id`	Inspect a key (without the secret).
`PATCH /admin/keys/:id`	Update limits, models, budget.
`DELETE /admin/keys/:id`	Revoke immediately.

Deployments and routing

Endpoint	Action
`GET /admin/deployments`	List configured deployments.
`POST /admin/deployments/disable`	Take a deployment out of rotation.
`POST /admin/deployments/enable`	Put a deployment back in rotation.
`GET /admin/spend`	Per-key, per-team, per-model spend rollups.

Error format

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-Gateway-Limit-Kind: rpm

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Key vk_01HZ... exceeded its 120 req/min limit.",
    "kind": "rpm",
    "retry_after_seconds": 12
  }
}

OpenAPI spec

The full machine-readable schema lives in the OSS repo — point your generator at it for client SDKs.

Need a deeper walkthrough?

The Quickstart and the language SDK pages walk through real request flows end-to-end.

Quickstart SDKs