One API. Every LLM.
Always the right one.
Connect every model — OpenAI, Anthropic, Google, Llama, your own — and let smart routing pick the cheapest, fastest one that clears your quality bar. On every request, in microseconds.
- One-line migration
- Multi-provider
- Open source
- <11µs overhead
- ›[trace abf9…] email → <pii:email:0>
- ›[trace abf9…] phone → <pii:phone:0>
- ›vault: in-memory, TTL 30s
backend/internal/. Click any stage on the diagram to open its deep-dive.Most LLM requests don’t need
your most expensive model.
You wired your app to GPT-5.5. It works. But most requests — extractions, classifications, summaries — don’t need it. They could run on models that are 10–20× cheaper, with no visible difference. Multiply that across every request.
- Every request hits the same model — even when it shouldn’t.Trivial requests pay flagship prices.
- One provider outage = your product goes down.No automatic fallback to a different vendor.
- Your costs increase without warning.Pricing changes upstream silently inflate your bill.
- New, cheaper models exist — but switching is painful.Releases, rollbacks and regression testing slow you down.
- “We’ll build routing later” never happens.It becomes a months-long internal project that nobody owns.
This is exactly what smart routing solves.
- Summarize document
- Extract entities
- Classify ticket
- Translate text
- Generate report
Expensive. Slow. Single point of failure.
- Summarize document
- Extract entities
- Classify ticket
- Translate text
- Generate report
- gpt-4o-miniFast
- Llama 3.1 70BVery Fast
- claude-3-haikuCheap
- Mixtral 8x22BCheap
- Custom ModelPrivate
- Lower costsPay only for what you need.
- Faster responsesRoute to the fastest available model.
- Higher reliabilityAutomatic failover across providers.
- Future-proofAdopt new models instantly.
live, in your browser, on your own prompt
Watch a real request get rerouted in front of you.
Type a prompt. Pick a routing mode. We classify it server-side, pick the right model from a five-rung OpenAI ladder ( gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o, gpt-4o-mini) — and on free-first mode, simulated routes to llama-3.3-70b and claude-haiku. Real reply streams in, dollars saved are computed against pinning the same call to the flagship.
3 free runs per browser session · prompts are not logged · keys stay server-side · simulated routes show a canned reply, real ones stream from OpenAI.
Like what you see? Run the same router in your own stack — same binary, no usage caps.
Explore everything you can do with Gateway-LLM.
Smart routing is the centre of the product. Around it sits the rest of the platform — governance, observability, deployment, and the SDKs that make it drop-in. Tap any card to read the deep-dive.
Smart Routing
Read postScore each prompt and pick the cheapest model that clears your quality bar.
Learn moreMulti-Provider Failover
Read postPrefer one provider, fall back to another in microseconds when health drops.
Learn moreVirtual API Keys
Read postIssue per-team keys with rate limits, model allowlists, and budgets.
Learn moreSemantic Caching
Read postCache near-duplicate prompts safely, with a tunable similarity threshold.
Learn moreLiteLLM Migration
Read postA one-line base_url swap and your existing OpenAI app is on the gateway.
Learn moreOpen-Source Architecture
Read postThe router, classifier, and policy engine are all in the open repo.
Learn moreObservability & Audit
Read docsPrometheus, OpenTelemetry, Datadog, Langfuse — wire it to whatever you run.
Learn moreSDKs & OpenAI Compatibility
Read docsDrop-in for the OpenAI Python, TypeScript, and Go SDKs. No rewrites.
Learn moreSelf-Host & Configuration
Read docsSingle binary, single config file. Run it on your VPC or on-prem.
Learn moreThree steps. The rest is invisible.
No new SDK to learn, no infrastructure to operate, no model list to maintain in your app code.
Point your existing OpenAI client at the gateway URL. Same SDK, same parameters, same response shape. Zero code rewrites in your app.
A local classifier reads the prompt and assigns a complexity score in microseconds. Your policy rules — region, budget, allowed providers, fallback order — are applied at the same time.
The cheapest model that clears your quality bar for that bucket gets the call. If it errors or slows, the gateway retries against the next-best provider before your user notices.
Every decision is logged with the score, the chosen model, and the dollars saved — visible in your dashboard the moment the response returns.
Drop-in today. Faster than your network.
Three claims engineering leads ask about: does it work with my stack, is it fast enough to live in the hot path, and can the savings actually be measured.
Speaks the OpenAI Chat Completions API verbatim. Every SDK that talks to OpenAI talks to Gateway-LLM unchanged — JavaScript, Python, Go, curl. Migrating from LiteLLM or Portkey is a one-command port; routes, virtual keys, and budgets carry over.
Median routing decision: 9.2 µs. p99 under load: 10.8 µs. The classifier runs in the same process as the proxy — no extra network round-trip, no Redis lookup, no Lambda cold start. Reproducible from make bench in the repo.
On a replayed customer trace of 12.4M requests, smart routing cut spend 47% versus pinning to gpt-4o, with no measurable change in human-eval quality scores. The trace replay tool ships in the repo so you can run it on your own logs before signing anything.
We make money when you save money. Not before.
Gateway-LLM is built to cut your model spend, not pad it. The full router, failover, caching, and budgets are open source and free forever. Hosted plans add convenience, analytics, and team controls — never a per-token markup.
Free
Self-hostedThe whole product. None of the limits.
- Smart Routing — full classifier and tier-based routing on every request
- Automatic failover across providers (OpenAI, Anthropic, Google, Mistral, Bedrock, Ollama, your own models)
- Response caching with semantic similarity matching
- Per-team and per-key budgets with hard cutoffs
- OpenAI-compatible endpoint — drop-in for every existing SDK
- Apache 2.0 license — read the source, fork it, run it on your VPC
No usage caps. No telemetry phoning home. No "community edition" tax. The hosted version uses the same binary you do.
Custom
Hosted, governed, or bothYour VPC, your contract, your savings.
- Managed cloud — we run it, patch it, scale it (or single-tenant in your VPC, on-prem, or air-gapped)
- Routing insights & spend dashboard — per-route savings, model mix, anomaly detection
- SSO (Okta / Google Workspace / Azure AD), SCIM provisioning, team roles, approval flows
- 99.9% SLA with credits, named on-call, contractual response times
- Verifiable receipts — Ed25519-signed, chained, replayable for audit
- Compliance — SOC 2 Type II, HIPAA-ready, GDPR DPA, ISO 27001 in flight
- Custom routing policies, written with our team and owned by you
Two ways to pay: a flat annual contract, or a small percentage of the dollars Smart Routing reduces. If we don't reduce your spend, you don't pay. Procurement-friendly: security review packs, vendor questionnaires answered, MSAs negotiated.
| Feature | Free | Custom |
|---|---|---|
| Smart Routing classifier | ||
| Cross-provider failover | ||
| Response caching | ||
| Per-team budgets | ||
| OpenAI SDK drop-in | ||
| Hosted, managed, autoscaled | — | |
| Routing analytics & spend dashboard | — | |
| SSO, SCIM, team roles | — | |
| SLA | — | 99.9% / Custom |
| Policy as code, signed receipts | Self-host | |
| SOC 2, HIPAA, custom DPA | — | |
| Private / air-gapped deployment | DIY | |
| Pricing model | $0 forever | % of savings or annual |
Pricing FAQ
Three things buyers always ask.
- You don’t want to operate the gateway yourself, but you want the savings now.
- You need spend visibility per team, per route, per customer — without writing your own analytics pipeline.
- Your team needs SSO, RBAC, and audit trails before security will sign off on production rollout.
- You want a 99.9% SLA, named on-call, and signed receipts for audit instead of a Discord channel.
- Procurement is asking for a SOC 2 report, MSA, or single-tenant / on-prem deployment.
If you’re a solo developer or a small team running internal tools, stay on Free. The router is the same binary; the only thing you give up is the dashboard, the SLA, and the procurement-friendly paperwork.
Smart routing pays for itself.
Run the open-source binary today. Wire up Pro the day spend visibility starts costing you more time than it would cost in a subscription. Talk to us about Enterprise the day procurement asks for a SOC 2 report.
Production-grade controls, on day one.
Security and compliance aren't a roadmap item — they're how the system was designed.
Every response is signed and chained with Ed25519. Replay any historical request, prove its routing decision, and verify nothing was tampered with end-to-end.
Express region-pinning, allowed providers, per-team budgets, and approval flows in version-controlled YAML. Reviewable in a PR, enforced at the gateway.
Email, phone, SSN, and credit-card patterns are detected before the prompt ever leaves your network, then rehydrated on the response stream.
Okta, Azure AD, Google Workspace. Every admin action is logged with actor, timestamp, and signed event ID.
Run it inside your VPC for maximum isolation, or let us host with single-tenant ingress, dedicated keys, and SOC 2 controls.
What you give up by going elsewhere.
The alternatives are good products. They're built around different bets. Here are the dimensions that actually matter when smart routing is the goal.
| Dimension | Gateway-LLM | LiteLLM | Portkey | Build it yourself |
|---|---|---|---|---|
| OpenAI-compatible drop-in | Maybe | |||
| Smart routing as core feature | Plugin | Paid | ||
| Routing decision in <11µs | You build it | |||
| Cross-provider failover | You build it | |||
| Per-team budgets & spend caps | You build it | |||
| Routing analytics & savings reporting | Paid | You build it | ||
| Open-source license | Apache 2.0 | MIT | Closed core | — |
| Self-host on your VPC | Limited | |||
| Per-token markup | $0 | $0 | $0 | $0 |
| SOC 2 / HIPAA hosted plan | Your problem |
Best if your priority is the absolute longest provider list. Smart routing is a community plugin, not a first-class feature.
Strong dashboards. Smart routing and many enterprise primitives are gated behind a paid tier; OSS core is intentionally minimal.
Cheapest on day one. Most expensive by month six — you'll write the classifier, the failover, the spend reports, the SOC 2 evidence, and the on-call.
Frequently asked questions.
Ready to take control of your LLM traffic?
Join hundreds of companies that switched to Gateway-LLM. Open-source, MIT-licensed, and ready for production.