Documentation

Virtual API keys

Issue per-team keys with rate limits, model allowlists, and hard budget caps. Revoke a key without rotating provider credentials.

3 min read · updated 2026-04-29

A virtual API key is a token your apps use to call the gateway. It's distinct from your provider keys (which stay in the gateway's environment) — virtual keys are short-lived, scoped, rate-limited, and revocable.

If you've ever rotated an OPENAI_API_KEY because one team leaked it into a Slack channel, you already know why this matters.

Anatomy of a virtual key

{
  "id": "vk_01HZX...",
  "key": "gw_virt_a8f2...e7c1",
  "name": "checkout-service",
  "models": ["gpt-4o-mini", "gpt-4o"],
  "rpm": 120,
  "tpm": 200000,
  "monthly_budget_usd": 50,
  "team": "payments",
  "created_at": "2026-04-29T12:00:00Z"
}

key — the secret your app sends as Authorization: Bearer .... Shown once at creation, stored hashed afterward.
models — allowlist. The key can only call these aliases.
rpm / tpm — requests per minute and tokens per minute. Enforced by Redis if configured, otherwise in-process.
monthly_budget_usd — hard ceiling. Once exceeded, the key 429s until the next month boundary.
team — free-form tag used in spend reports.

Issuing keys

From the admin UI

Open http://localhost:3000, log in with the master key, click Create virtual key. Copy the gw_virt_... token immediately — it's shown only once.

From the API

curl -X POST http://localhost:8080/admin/keys \
  -H "Authorization: Bearer $GATEWAY_LLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "checkout-service",
    "team": "payments",
    "models": ["gpt-4o-mini", "gpt-4o"],
    "rpm": 120,
    "tpm": 200000,
    "monthly_budget_usd": 50
  }'

The response is the JSON object above. Stash the key in your secret manager.

Rate limiting

Limits are evaluated in this order:

TPM (tokens per minute) — counted against the prompt token estimate at request time, plus the completion tokens after the upstream returns.
RPM (requests per minute) — one bump per accepted request.
Monthly budget — cost in USD, summed across every request that key has made this month.

When a limit is hit, the gateway returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-Gateway-Limit-Kind: rpm
X-Gateway-Limit-Reset: 12

Your client can read Retry-After and back off. Prometheus counter gatewayllm_rate_limit_hit_total{kind="rpm"} increments so you can alert on it.

Revoking a key

curl -X DELETE http://localhost:8080/admin/keys/vk_01HZX... \
  -H "Authorization: Bearer $GATEWAY_LLM_MASTER_KEY"

In-flight requests on that key finish; new requests get 401 Unauthorized immediately. No provider credentials change.

Scoping by router behaviour

Virtual keys can also pin a routing strategy for the team that holds them:

{
  "name": "compliance-team",
  "models": ["claude-sonnet"],
  "router": { "force_strategy": "least_latency", "shadow": false },
  "rpm": 30
}

That team can only hit claude-sonnet, the router uses least_latency for them specifically, and shadow mode is off — every decision is live.

Recommended structure

For a 5–50 person team:

One key per service, not one key per developer. Services have stable identities; developers churn.
Tag with team, not service name. Spend reports group by team naturally.
Set a monthly budget even if you trust the team. The cap is your fire alarm — not a financial penalty.
Lock models to the cheap-tier alias for batch / async jobs (embed, summarize) and let only customer-facing services hit the flagship.

Auditing

Every request logs an immutable record:

{
  "request_id": "req_01HZ...",
  "key_id": "vk_01HZX...",
  "team": "payments",
  "model_alias": "gpt-4o-mini",
  "deployment": "openai/gpt-4o-mini",
  "prompt_tokens": 312,
  "completion_tokens": 81,
  "cost_usd": 0.000099,
  "duration_ms": 412,
  "ts": "2026-04-29T12:34:56Z"
}

Stream them out with the observability callbacks, aggregate them in your warehouse, and finance gets answers without paging engineering.

What's next

Virtual API keys for LLM governance — the why, with team patterns.
Observability — get the spend records into Datadog, Splunk, or your warehouse.

Stuck or want a feature? Email the founders directly at mitshawtechnologies@gmail.com. We answer fast.