Virtual API KeysGovernance

Virtual API Keys: How to Govern LLM Spend Across Teams

Q: What are virtual API keys for LLMs?

Virtual API keys are short, scoped, revocable tokens you issue to your applications. They sit in front of your provider credentials so leaking a virtual key never forces a provider rotation. Each one carries its own rate limits, model allowlist, and budget cap.

Q: Why not just use OpenAI''s native API keys per team?

OpenAI keys can't be cross-provider, can't be model-allowlisted, and can't enforce a USD budget. They're also painful to rotate if leaked. Virtual keys give you all of those, plus a single key that works across OpenAI, Anthropic, Google, and your own models.

Q: How do virtual key budgets actually work?

The gateway computes a USD cost for every request against its pricing table, sums the running total per key per month, and rejects calls with HTTP 429 once the budget is exceeded. The window resets at the calendar month boundary by default; you can configure rolling windows.

A practical guide to issuing per-team virtual API keys with rate limits, model allowlists, and hard budget caps — without ever rotating your provider credentials.

Gateway-LLM team · April 18, 2026 · 6 min read

TL;DR

A virtual API key is a short, revocable token your application uses to call the gateway. It sits in front of your provider credentials, so a leaked key is a five-second revocation, not a five-hour key rotation. Each one carries a rate limit (RPM and TPM), a model allowlist, and an optional monthly USD budget. This post covers how to design them, how to scope them per team, and the spend-governance patterns that come naturally once you have them.

The problem they solve

Without a gateway, your three engineering teams share OPENAI_API_KEY. When a developer pushes it to a public GitHub repo at 11 PM:

You revoke OPENAI_API_KEY.
Every running service that holds that key starts failing.
You generate a new one in OpenAI's dashboard.
You distribute it to every service via your secret manager.
You wait for re-deploys to pick it up.
You answer Slack messages from the team that was in the middle of a launch.

This takes hours and you've lost product time. The leak is a people problem, but the blast radius is an architectural problem.

Virtual API keys turn that 11 PM hour into a five-second DELETE /admin/keys/vk_…. Provider credentials don't change. Other services keep running. The leaked key returns 401 immediately for anything still trying to use it.

Anatomy of a virtual key

{
  "id": "vk_01HZX...",
  "key": "gw_virt_a8f2...e7c1",
  "name": "checkout-service",
  "team": "payments",
  "models": ["gpt-4o-mini", "gpt-4o"],
  "rpm": 120,
  "tpm": 200000,
  "monthly_budget_usd": 50,
  "router": { "force_strategy": "least_latency" },
  "created_at": "2026-04-29T12:00:00Z"
}

Five things to understand:

key — the bearer token your app sends. Hashed at rest; shown once at creation.
models — the allowlist. The key gets 403 Forbidden if it tries to call an alias not in this list.
rpm / tpm — requests-per-minute / tokens-per-minute. The first ceiling hit returns 429.
monthly_budget_usd — hard cap. Once exceeded, every call returns 429 with X-Gateway-Limit-Kind: budget until the next month.
router — optional per-key router overrides. Lets a compliance team get least_latency even if the global default is classifier.

Designing keys per team

The pattern that holds up across team sizes:

One key per service, not per developer

Services have stable identities; developers churn. If checkout-service and notifications-service share a key, you can't tell which one is responsible for a spike. Keep them separate from day one.

Use the `team` tag for cost reporting

Spend reports are most useful aggregated by team — it answers "who needs to optimise" without naming individuals. Tag every service-key with the owning team. Multiple keys can share a team tag.

curl -X POST http://localhost:8080/admin/keys \
  -H "Authorization: Bearer $MASTER_KEY" \
  -d '{
    "name": "ml-platform-batch",
    "team": "ml-platform",
    "models": ["embed", "summarize"],
    "rpm": 60,
    "monthly_budget_usd": 200
  }'

Lock `models` to the cheapest viable alias

Async / batch jobs (embeddings, summaries, classifications) almost always belong on a cheap, specifically-tuned alias. Customer-facing services are the only ones that need flagship access. Restricting the allowlist on batch keys prevents accidental six-figure mistakes.

Set a budget even when you trust the team

The cap is a fire alarm, not a financial penalty. Setting monthly_budget_usd: 200 on a job that normally spends $40 means a runaway loop pages you at $200, not at $20,000. The team that owns the key gets a cost-overrun alert and decides whether to raise the cap or fix the bug.

Concrete team archetypes

Three patterns we see most often:

Customer-facing service

{
  "name": "support-chatbot",
  "team": "customer-success",
  "models": ["auto"],
  "rpm": 600,
  "tpm": 1000000,
  "monthly_budget_usd": 5000,
  "router": { "force_strategy": "classifier" }
}

High RPM (we expect spiky human traffic), generous budget (this is revenue-generating), classifier strategy (we want smart routing on customer prompts because they're a wide difficulty spread).

Internal automation

{
  "name": "support-ticket-classifier",
  "team": "data",
  "models": ["gpt-4o-mini"],
  "rpm": 30,
  "tpm": 80000,
  "monthly_budget_usd": 100
}

Low RPM (this runs on a cron), tight budget (it's batch work, not revenue), pinned to gpt-4o-mini because we already know mini-tier is sufficient for the task.

Compliance / regulated workload

{
  "name": "legal-redaction",
  "team": "legal",
  "models": ["claude-sonnet"],
  "rpm": 20,
  "monthly_budget_usd": 800,
  "router": {
    "force_strategy": "least_latency",
    "allow_providers": ["anthropic"]
  }
}

Locked to a single alias and a single provider (compliance contracts often forbid certain vendors), conservative router (don't classifier-route around the contractually-mandated provider), modest RPM.

Day 30 governance review

After thirty days of running on virtual keys, do a sweep:

Which keys hit their budget cap? Either the cap was wrong (raise it) or the spend pattern is wrong (investigate). Both deserve attention.
Which keys never came close to their RPM? Lower the limit. Tight RPM caps are early-warning systems for runaway loops.
Which keys are still on models: ["*"]? Pin them. The audit trail gets exponentially more useful when every key has a tight allowlist.
Which teams have multiple keys with the same team tag? That's fine. It's how you should be set up.

Auditing

Every request through a virtual key writes a structured audit row:

{
  "request_id": "req_01HZ...",
  "key_id": "vk_01HZX...",
  "team": "payments",
  "model_alias": "gpt-4o-mini",
  "deployment": "openai/gpt-4o-mini",
  "prompt_tokens": 312,
  "completion_tokens": 81,
  "cost_usd": 0.000099,
  "duration_ms": 412,
  "ts": "2026-04-29T12:34:56Z"
}

Stream those rows to Snowflake / BigQuery / your warehouse via the observability callbacks, and you can answer any spend question with SQL. "What did the payments team spend last month, broken down by alias?" is a GROUP BY team, model_alias, date_trunc('month', ts).

Try it

If you're already running Gateway-LLM, virtual keys are in the admin UI under Keys → Create. If you're not, the Quickstart gets you running in five minutes; the second curl example after that creates your first virtual key.

Provider keys stay in your secret manager. Virtual keys live in Postgres, scoped, observable, and revocable. That's the whole shape.

FAQ

What are virtual API keys for LLMs?
Short, scoped, revocable tokens you issue to your apps. They sit in front of your provider credentials so a leaked virtual key never forces a provider rotation.

Why not just use OpenAI's native API keys per team?
Provider keys can't be cross-provider, model-allowlisted, or budget-capped. Virtual keys give you all of those.

How do virtual key budgets actually work?
The gateway sums per-request costs against your monthly cap and 429s once you exceed it. Default reset is calendar-monthly; rolling windows are configurable.

Run Gateway-LLM in five minutes.

Open-source, OpenAI-compatible, and Apache 2.0. The Quickstart is a single docker compose up.

Read the Quickstart Talk to the founders