Documentation

Configuration

The shape of config.yaml, every environment variable, and how to manage secrets in production.

6 min read · updated 2026-04-29

Gateway-LLM is configured through one YAML file (config.yaml) plus a small set of environment variables. The YAML file defines what models exist; environment variables hold the secrets they need.

File structure

server:
  port: 8080
  read_timeout: 30s
  write_timeout: 120s
  graceful_shutdown_timeout: 30s

database:
  url: ${DATABASE_URL}
  max_connections: 25

redis:
  url: ${REDIS_URL}

auth:
  master_key: ${GATEWAY_LLM_MASTER_KEY}

model_list:
  - model_alias: 'gpt-4o'
    deployments:
      - provider: openai
        model: gpt-4o
        api_key_env: OPENAI_API_KEY

  - model_alias: 'smart'
    deployments:
      - provider: openai
        model: gpt-4o
        api_key_env: OPENAI_API_KEY
      - provider: anthropic
        model: claude-sonnet-4-20250514
        api_key_env: ANTHROPIC_API_KEY
      - provider: gemini
        model: gemini-2.0-flash
        api_key_env: GEMINI_API_KEY

router:
  strategy: classifier  # round_robin | least_latency | classifier
  retries: 2
  retry_after_ms: 200

cache:
  enabled: true
  semantic:
    enabled: true
    similarity_threshold: 0.93
    ttl_seconds: 3600

Section by section

server

Boring HTTP knobs. port is the only one most teams change. Keep write_timeout generous — long completions will hold the connection open while streaming.

database and redis

PostgreSQL stores virtual keys, spend records, and audit logs. Redis holds rate-limit counters. Both URLs are read from environment variables (templated as ${DATABASE_URL} / ${REDIS_URL}) so the same config.yaml works in dev and prod.

If you don't need distributed rate limiting (single instance, no autoscale) you can skip Redis — the gateway falls back to an in-process counter.

auth.master_key

The single root secret. It can hit every admin endpoint (/admin/*). Don't ship it to your apps. Use it once to mint virtual keys, then store it in your secret manager and forget about it day-to-day.

model_list

The heart of the file. Each entry maps a stable alias that your code uses (smart, gpt-4o, cheap) to one or more deployments that actually fulfil it. When the alias has more than one deployment, the router decides which one wins per request.

- model_alias: 'cheap'
  deployments:
    - provider: openai
      model: gpt-4o-mini
      api_key_env: OPENAI_API_KEY
      weight: 80
    - provider: gemini
      model: gemini-2.0-flash
      api_key_env: GEMINI_API_KEY
      weight: 20

weight (optional) biases round-robin selection. api_key_env references the name of the env var, not the value — so secrets stay out of YAML.

router

Picks one of three strategies:

  • round_robin — even split across deployments, weighted if you set weight.
  • least_latency — picks the deployment with the lowest p50 over the last 60s.
  • classifier — scores the prompt's complexity and picks a deployment per-request. This is what gives you the 40–70% savings.

retries and retry_after_ms apply on top of the strategy: if the chosen deployment 5xxs or times out, the router transparently re-tries the next one.

cache

Two layers. Exact-match (the same prompt, again) is a hash lookup. Semantic match (a different but very-similar prompt) uses an embedding similarity threshold; raise it to be conservative, lower it to cache more aggressively. See Semantic caching for LLMs for tuning advice.

Environment variables

| Name | Required | Purpose | |---|---|---| | GATEWAY_LLM_MASTER_KEY | yes | Root admin secret. Pick anything long and random. | | DATABASE_URL | yes | PostgreSQL DSN, e.g. postgres://user:pass@db:5432/gw. | | REDIS_URL | no | Redis DSN. If unset, rate-limits run in-process. | | OPENAI_API_KEY | depends | Required if any deployment uses provider: openai. | | ANTHROPIC_API_KEY | depends | Required if any deployment uses provider: anthropic. | | GEMINI_API_KEY | depends | Required if any deployment uses provider: gemini. | | AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION | depends | For Bedrock deployments. | | OLLAMA_BASE_URL | depends | For self-hosted Ollama deployments. Defaults to http://localhost:11434. |

openai_compat

Controls the OpenAI passthrough layer. When enabled, any /v1/* request that does not match a typed Gateway-LLM handler (like /v1/chat/completions or /v1/models) is forwarded to the upstream OpenAI-compatible provider. This lets Cursor, the OpenAI SDK, and other clients use files, assistants, threads, vector stores, batches, fine-tuning, and future OpenAI endpoints without waiting for typed handler support.

openai_compat:
  enabled: true
  default_model_alias: 'gpt-4o-mini'    # used for model-less endpoints like /v1/files
  # allowed_prefixes and blocked_prefixes have sensible defaults;
  # override only if you need to restrict or extend the passthrough.
  • enabled — defaults to true. Set to false to disable passthrough entirely.
  • default_model_alias — the model alias used to resolve the upstream provider for endpoints that don't include a model field in their request body (e.g. /v1/files, /v1/vector_stores). If unset, those requests will fail with a clear error.
  • allowed_prefixes — list of /v1/* path prefixes the passthrough will forward. Defaults to all known official OpenAI endpoint families.
  • blocked_prefixes — list of /v1/* path prefixes that are never forwarded (Gateway-owned routes like /v1/management, /v1/metrics, /v1/receipts, etc.).

Auth, rate-limiting, model access checks, and audit apply to passthrough requests identically to typed handlers.

Using Gateway-LLM with Cursor

To use Gateway-LLM as a custom OpenAI endpoint in Cursor:

  1. Open Cursor Settings > Models.
  2. Under the OpenAI section, set the API key to your Gateway-LLM key (sk-gatewayllm-...).
  3. Set the Base URL to https://api.gateway-llm.com/v1 (your gateway's URL with /v1 suffix — do not include /chat/completions).
  4. Select a model in Cursor that matches an alias configured in your Gateway-LLM model_list (e.g. gpt-4o-mini).

If the model name Cursor sends doesn't match any alias in your gateway, you'll get a "model not found" error. Either add the model as an alias in your model_list, or create the API key with models: ["*"] to allow all aliases.

Reloading config

Changes to config.yaml are picked up on SIGHUP — no full restart needed:

docker compose kill -s SIGHUP gateway-llm

The reload is atomic. In-flight requests finish on the old config; new requests use the new one.

Production checklist

Before pointing real traffic at the gateway:

  • Master key is in your secret manager, not in .env on disk.
  • config.yaml is checked into source control; secrets are not.
  • TLS terminates in front of the gateway (Caddy / Nginx / your cloud's LB).
  • /metrics is on an internal listener or behind auth.
  • database and redis are persistent — they hold your spend audit trail and rate-limit state.

When all of that's true, see Smart routing to actually start saving money.


Stuck or want a feature? Email the founders directly at mitshawtechnologies@gmail.com. We answer fast.