Documentation
Configuration
The shape of config.yaml, every environment variable, and how to manage secrets in production.
6 min read · updated 2026-04-29
Gateway-LLM is configured through one YAML file (config.yaml) plus a small set of environment variables. The YAML file defines what models exist; environment variables hold the secrets they need.
File structure
server:
port: 8080
read_timeout: 30s
write_timeout: 120s
graceful_shutdown_timeout: 30s
database:
url: ${DATABASE_URL}
max_connections: 25
redis:
url: ${REDIS_URL}
auth:
master_key: ${GATEWAY_LLM_MASTER_KEY}
model_list:
- model_alias: 'gpt-4o'
deployments:
- provider: openai
model: gpt-4o
api_key_env: OPENAI_API_KEY
- model_alias: 'smart'
deployments:
- provider: openai
model: gpt-4o
api_key_env: OPENAI_API_KEY
- provider: anthropic
model: claude-sonnet-4-20250514
api_key_env: ANTHROPIC_API_KEY
- provider: gemini
model: gemini-2.0-flash
api_key_env: GEMINI_API_KEY
router:
strategy: classifier # round_robin | least_latency | classifier
retries: 2
retry_after_ms: 200
cache:
enabled: true
semantic:
enabled: true
similarity_threshold: 0.93
ttl_seconds: 3600
Section by section
server
Boring HTTP knobs. port is the only one most teams change. Keep write_timeout generous — long completions will hold the connection open while streaming.
database and redis
PostgreSQL stores virtual keys, spend records, and audit logs. Redis holds rate-limit counters. Both URLs are read from environment variables (templated as ${DATABASE_URL} / ${REDIS_URL}) so the same config.yaml works in dev and prod.
If you don't need distributed rate limiting (single instance, no autoscale) you can skip Redis — the gateway falls back to an in-process counter.
auth.master_key
The single root secret. It can hit every admin endpoint (/admin/*). Don't ship it to your apps. Use it once to mint virtual keys, then store it in your secret manager and forget about it day-to-day.
model_list
The heart of the file. Each entry maps a stable alias that your code uses (smart, gpt-4o, cheap) to one or more deployments that actually fulfil it. When the alias has more than one deployment, the router decides which one wins per request.
- model_alias: 'cheap'
deployments:
- provider: openai
model: gpt-4o-mini
api_key_env: OPENAI_API_KEY
weight: 80
- provider: gemini
model: gemini-2.0-flash
api_key_env: GEMINI_API_KEY
weight: 20
weight (optional) biases round-robin selection. api_key_env references the name of the env var, not the value — so secrets stay out of YAML.
router
Picks one of three strategies:
round_robin— even split across deployments, weighted if you setweight.least_latency— picks the deployment with the lowest p50 over the last 60s.classifier— scores the prompt's complexity and picks a deployment per-request. This is what gives you the 40–70% savings.
retries and retry_after_ms apply on top of the strategy: if the chosen deployment 5xxs or times out, the router transparently re-tries the next one.
cache
Two layers. Exact-match (the same prompt, again) is a hash lookup. Semantic match (a different but very-similar prompt) uses an embedding similarity threshold; raise it to be conservative, lower it to cache more aggressively. See Semantic caching for LLMs for tuning advice.
Environment variables
| Name | Required | Purpose |
|---|---|---|
| GATEWAY_LLM_MASTER_KEY | yes | Root admin secret. Pick anything long and random. |
| DATABASE_URL | yes | PostgreSQL DSN, e.g. postgres://user:pass@db:5432/gw. |
| REDIS_URL | no | Redis DSN. If unset, rate-limits run in-process. |
| OPENAI_API_KEY | depends | Required if any deployment uses provider: openai. |
| ANTHROPIC_API_KEY | depends | Required if any deployment uses provider: anthropic. |
| GEMINI_API_KEY | depends | Required if any deployment uses provider: gemini. |
| AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION | depends | For Bedrock deployments. |
| OLLAMA_BASE_URL | depends | For self-hosted Ollama deployments. Defaults to http://localhost:11434. |
openai_compat
Controls the OpenAI passthrough layer. When enabled, any /v1/* request that does not match a typed Gateway-LLM handler (like /v1/chat/completions or /v1/models) is forwarded to the upstream OpenAI-compatible provider. This lets Cursor, the OpenAI SDK, and other clients use files, assistants, threads, vector stores, batches, fine-tuning, and future OpenAI endpoints without waiting for typed handler support.
openai_compat:
enabled: true
default_model_alias: 'gpt-4o-mini' # used for model-less endpoints like /v1/files
# allowed_prefixes and blocked_prefixes have sensible defaults;
# override only if you need to restrict or extend the passthrough.
enabled— defaults totrue. Set tofalseto disable passthrough entirely.default_model_alias— the model alias used to resolve the upstream provider for endpoints that don't include amodelfield in their request body (e.g./v1/files,/v1/vector_stores). If unset, those requests will fail with a clear error.allowed_prefixes— list of/v1/*path prefixes the passthrough will forward. Defaults to all known official OpenAI endpoint families.blocked_prefixes— list of/v1/*path prefixes that are never forwarded (Gateway-owned routes like/v1/management,/v1/metrics,/v1/receipts, etc.).
Auth, rate-limiting, model access checks, and audit apply to passthrough requests identically to typed handlers.
Using Gateway-LLM with Cursor
To use Gateway-LLM as a custom OpenAI endpoint in Cursor:
- Open Cursor Settings > Models.
- Under the OpenAI section, set the API key to your Gateway-LLM key (
sk-gatewayllm-...). - Set the Base URL to
https://api.gateway-llm.com/v1(your gateway's URL with/v1suffix — do not include/chat/completions). - Select a model in Cursor that matches an alias configured in your Gateway-LLM
model_list(e.g.gpt-4o-mini).
If the model name Cursor sends doesn't match any alias in your gateway, you'll get a "model not found" error. Either add the model as an alias in your model_list, or create the API key with models: ["*"] to allow all aliases.
Reloading config
Changes to config.yaml are picked up on SIGHUP — no full restart needed:
docker compose kill -s SIGHUP gateway-llm
The reload is atomic. In-flight requests finish on the old config; new requests use the new one.
Production checklist
Before pointing real traffic at the gateway:
- Master key is in your secret manager, not in
.envon disk. config.yamlis checked into source control; secrets are not.- TLS terminates in front of the gateway (Caddy / Nginx / your cloud's LB).
/metricsis on an internal listener or behind auth.databaseandredisare persistent — they hold your spend audit trail and rate-limit state.
When all of that's true, see Smart routing to actually start saving money.
Stuck or want a feature? Email the founders directly at mitshawtechnologies@gmail.com. We answer fast.