Documentation
Virtual API keys
Issue per-team keys with rate limits, model allowlists, and hard budget caps. Revoke a key without rotating provider credentials.
3 min read · updated 2026-04-29
A virtual API key is a token your apps use to call the gateway. It's distinct from your provider keys (which stay in the gateway's environment) — virtual keys are short-lived, scoped, rate-limited, and revocable.
If you've ever rotated an OPENAI_API_KEY because one team leaked it into a Slack channel, you already know why this matters.
Anatomy of a virtual key
{
"id": "vk_01HZX...",
"key": "gw_virt_a8f2...e7c1",
"name": "checkout-service",
"models": ["gpt-4o-mini", "gpt-4o"],
"rpm": 120,
"tpm": 200000,
"monthly_budget_usd": 50,
"team": "payments",
"created_at": "2026-04-29T12:00:00Z"
}
key— the secret your app sends asAuthorization: Bearer .... Shown once at creation, stored hashed afterward.models— allowlist. The key can only call these aliases.rpm/tpm— requests per minute and tokens per minute. Enforced by Redis if configured, otherwise in-process.monthly_budget_usd— hard ceiling. Once exceeded, the key 429s until the next month boundary.team— free-form tag used in spend reports.
Issuing keys
From the admin UI
Open http://localhost:3000, log in with the master key, click Create virtual key. Copy the gw_virt_... token immediately — it's shown only once.
From the API
curl -X POST http://localhost:8080/admin/keys \
-H "Authorization: Bearer $GATEWAY_LLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "checkout-service",
"team": "payments",
"models": ["gpt-4o-mini", "gpt-4o"],
"rpm": 120,
"tpm": 200000,
"monthly_budget_usd": 50
}'
The response is the JSON object above. Stash the key in your secret manager.
Rate limiting
Limits are evaluated in this order:
- TPM (tokens per minute) — counted against the prompt token estimate at request time, plus the completion tokens after the upstream returns.
- RPM (requests per minute) — one bump per accepted request.
- Monthly budget — cost in USD, summed across every request that key has made this month.
When a limit is hit, the gateway returns:
HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-Gateway-Limit-Kind: rpm
X-Gateway-Limit-Reset: 12
Your client can read Retry-After and back off. Prometheus counter gatewayllm_rate_limit_hit_total{kind="rpm"} increments so you can alert on it.
Revoking a key
curl -X DELETE http://localhost:8080/admin/keys/vk_01HZX... \
-H "Authorization: Bearer $GATEWAY_LLM_MASTER_KEY"
In-flight requests on that key finish; new requests get 401 Unauthorized immediately. No provider credentials change.
Scoping by router behaviour
Virtual keys can also pin a routing strategy for the team that holds them:
{
"name": "compliance-team",
"models": ["claude-sonnet"],
"router": { "force_strategy": "least_latency", "shadow": false },
"rpm": 30
}
That team can only hit claude-sonnet, the router uses least_latency for them specifically, and shadow mode is off — every decision is live.
Recommended structure
For a 5–50 person team:
- One key per service, not one key per developer. Services have stable identities; developers churn.
- Tag with
team, not service name. Spend reports group by team naturally. - Set a monthly budget even if you trust the team. The cap is your fire alarm — not a financial penalty.
- Lock
modelsto the cheap-tier alias for batch / async jobs (embed,summarize) and let only customer-facing services hit the flagship.
Auditing
Every request logs an immutable record:
{
"request_id": "req_01HZ...",
"key_id": "vk_01HZX...",
"team": "payments",
"model_alias": "gpt-4o-mini",
"deployment": "openai/gpt-4o-mini",
"prompt_tokens": 312,
"completion_tokens": 81,
"cost_usd": 0.000099,
"duration_ms": 412,
"ts": "2026-04-29T12:34:56Z"
}
Stream them out with the observability callbacks, aggregate them in your warehouse, and finance gets answers without paging engineering.
What's next
- Virtual API keys for LLM governance — the why, with team patterns.
- Observability — get the spend records into Datadog, Splunk, or your warehouse.
Stuck or want a feature? Email the founders directly at mitshawtechnologies@gmail.com. We answer fast.