Smart routing. Open source. Live now.

One API. Every LLM.
Always the right one.

Connect every model — OpenAI, Anthropic, Google, Llama, your own — and let smart routing pick the cheapest, fastest one that clears your quality bar. On every request, in microseconds.

  • One-line migration
  • Multi-provider
  • Open source
  • <11µs overhead
trace abf9e2…
overhead 9.2µs
Redact PIIlocal vaultPolicyregion · providerSmartRouterONNX classifierProviderOpenAI / Claude / …Rehydratestream · token-safeSigned receiptEd25519 · verifiable
REDACT
live
  • [trace abf9…] email → <pii:email:0>
  • [trace abf9…] phone → <pii:phone:0>
  • vault: in-memory, TTL 30s
Every box is an actual piece of running Go code under backend/internal/. Click any stage on the diagram to open its deep-dive.

Most LLM requests don’t need your most expensive model.

You wired your app to GPT-5.5. It works. But most requests — extractions, classifications, summaries — don’t need it. They could run on models that are 10–20× cheaper, with no visible difference. Multiply that across every request.

Teams save
50–80%
by routing requests to the right model.
  • Every request hits the same model — even when it shouldn’t.
    Trivial requests pay flagship prices.
  • One provider outage = your product goes down.
    No automatic fallback to a different vendor.
  • Your costs increase without warning.
    Pricing changes upstream silently inflate your bill.
  • New, cheaper models exist — but switching is painful.
    Releases, rollbacks and regression testing slow you down.
  • “We’ll build routing later” never happens.
    It becomes a months-long internal project that nobody owns.

This is exactly what smart routing solves.

Before — one model for everything
  • Summarize document
  • Extract entities
  • Classify ticket
  • Translate text
  • Generate report
GPT-5.5
flagship only

Expensive. Slow. Single point of failure.

With Gateway-LLM
  • Summarize document
  • Extract entities
  • Classify ticket
  • Translate text
  • Generate report
Gateway-LLM
Smart Routing Engine
CostSpeedQualityReliability
  • gpt-4o-miniFast
  • Llama 3.1 70BVery Fast
  • claude-3-haikuCheap
  • Mixtral 8x22BCheap
  • Custom ModelPrivate
  • Lower costs
    Pay only for what you need.
  • Faster responses
    Route to the fastest available model.
  • Higher reliability
    Automatic failover across providers.
  • Future-proof
    Adopt new models instantly.

live, in your browser, on your own prompt

Watch a real request get rerouted in front of you.

Type a prompt. Pick a routing mode. We classify it server-side, pick the right model from a five-rung OpenAI ladder ( gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o, gpt-4o-mini) — and on free-first mode, simulated routes to llama-3.3-70b and claude-haiku. Real reply streams in, dollars saved are computed against pinning the same call to the flagship.

3 free runs per browser session · prompts are not logged · keys stay server-side · simulated routes show a canned reply, real ones stream from OpenAI.

free demo quota
3 / 3left
81 chars · ~21 tokens
live gateway telemetryidle
// hit "Send request" to fire a live call.
// the router decides in < 1 µs server-side,
// then streams the chosen provider back to you.

Like what you see? Run the same router in your own stack — same binary, no usage caps.

End-to-end in plain English

Three steps. The rest is invisible.

No new SDK to learn, no infrastructure to operate, no model list to maintain in your app code.

STEP 01
Send one request

Point your existing OpenAI client at the gateway URL. Same SDK, same parameters, same response shape. Zero code rewrites in your app.

STEP 02
Gateway scores complexity & policy

A local classifier reads the prompt and assigns a complexity score in microseconds. Your policy rules — region, budget, allowed providers, fallback order — are applied at the same time.

STEP 03
Best model & provider chosen automatically

The cheapest model that clears your quality bar for that bucket gets the call. If it errors or slows, the gateway retries against the next-best provider before your user notices.

Every decision is logged with the score, the chosen model, and the dollars saved — visible in your dashboard the moment the response returns.

Receipts, not promises

Drop-in today. Faster than your network.

Three claims engineering leads ask about: does it work with my stack, is it fast enough to live in the hot path, and can the savings actually be measured.

<11µs
p99 routing overhead
40–70%
median spend reduction
100%
OpenAI SDK compatibility
Compatibility
Drops in where you already are.

Speaks the OpenAI Chat Completions API verbatim. Every SDK that talks to OpenAI talks to Gateway-LLM unchanged — JavaScript, Python, Go, curl. Migrating from LiteLLM or Portkey is a one-command port; routes, virtual keys, and budgets carry over.

Performance
In-process. No extra hop.

Median routing decision: 9.2 µs. p99 under load: 10.8 µs. The classifier runs in the same process as the proxy — no extra network round-trip, no Redis lookup, no Lambda cold start. Reproducible from make bench in the repo.

Benchmarks
Run it on your own logs.

On a replayed customer trace of 12.4M requests, smart routing cut spend 47% versus pinning to gpt-4o, with no measurable change in human-eval quality scores. The trace replay tool ships in the repo so you can run it on your own logs before signing anything.

Honest pricing

We make money when you save money. Not before.

Gateway-LLM is built to cut your model spend, not pad it. The full router, failover, caching, and budgets are open source and free forever. Hosted plans add convenience, analytics, and team controls — never a per-token markup.

Zero per-token markupOne-line migrationBring your own provider keysCancel anytime

Free

Self-hosted
$0forever

The whole product. None of the limits.

  • Smart Routing — full classifier and tier-based routing on every request
  • Automatic failover across providers (OpenAI, Anthropic, Google, Mistral, Bedrock, Ollama, your own models)
  • Response caching with semantic similarity matching
  • Per-team and per-key budgets with hard cutoffs
  • OpenAI-compatible endpoint — drop-in for every existing SDK
  • Apache 2.0 license — read the source, fork it, run it on your VPC

No usage caps. No telemetry phoning home. No "community edition" tax. The hosted version uses the same binary you do.

Most popular

Custom

Hosted, governed, or both
Customannual or % of savings

Your VPC, your contract, your savings.

  • Managed cloud — we run it, patch it, scale it (or single-tenant in your VPC, on-prem, or air-gapped)
  • Routing insights & spend dashboard — per-route savings, model mix, anomaly detection
  • SSO (Okta / Google Workspace / Azure AD), SCIM provisioning, team roles, approval flows
  • 99.9% SLA with credits, named on-call, contractual response times
  • Verifiable receipts — Ed25519-signed, chained, replayable for audit
  • Compliance — SOC 2 Type II, HIPAA-ready, GDPR DPA, ISO 27001 in flight
  • Custom routing policies, written with our team and owned by you

Two ways to pay: a flat annual contract, or a small percentage of the dollars Smart Routing reduces. If we don't reduce your spend, you don't pay. Procurement-friendly: security review packs, vendor questionnaires answered, MSAs negotiated.

FeatureFreeCustom
Smart Routing classifier
Cross-provider failover
Response caching
Per-team budgets
OpenAI SDK drop-in
Hosted, managed, autoscaled
Routing analytics & spend dashboard
SSO, SCIM, team roles
SLA99.9% / Custom
Policy as code, signed receiptsSelf-host
SOC 2, HIPAA, custom DPA
Private / air-gapped deploymentDIY
Pricing model$0 forever% of savings or annual

Pricing FAQ

Three things buyers always ask.

Because Smart Routing only works if every team can trust how it makes decisions. The classifier, the policy engine, the failover logic — all of it is in the open repo, under Apache 2.0, with no usage limits and no “community edition” carve-outs. We sell convenience and governance on top. We don’t sell the router itself.
Most teams send every prompt to a flagship model just in case. But the majority of LLM traffic — extractions, classifications, short summaries — runs perfectly on a cheaper tier. Gateway-LLM scores each prompt’s complexity in microseconds and routes the easy ones to mini-tier models, while keeping hard prompts on the flagship. On replayed customer traces we see 40–70% reduction in model spend with no measurable change in quality scores. We never add a per-token markup, so the savings stay yours — you only pay us for hosted-plan convenience and observability.
Move to Custom when one of these is true:
  • You don’t want to operate the gateway yourself, but you want the savings now.
  • You need spend visibility per team, per route, per customer — without writing your own analytics pipeline.
  • Your team needs SSO, RBAC, and audit trails before security will sign off on production rollout.
  • You want a 99.9% SLA, named on-call, and signed receipts for audit instead of a Discord channel.
  • Procurement is asking for a SOC 2 report, MSA, or single-tenant / on-prem deployment.

If you’re a solo developer or a small team running internal tools, stay on Free. The router is the same binary; the only thing you give up is the dashboard, the SLA, and the procurement-friendly paperwork.

Smart routing pays for itself.

Run the open-source binary today. Wire up Pro the day spend visibility starts costing you more time than it would cost in a subscription. Talk to us about Enterprise the day procurement asks for a SOC 2 report.

Built for teams under audit

Production-grade controls, on day one.

Security and compliance aren't a roadmap item — they're how the system was designed.

Verifiable receipts

Every response is signed and chained with Ed25519. Replay any historical request, prove its routing decision, and verify nothing was tampered with end-to-end.

Policy as code

Express region-pinning, allowed providers, per-team budgets, and approval flows in version-controlled YAML. Reviewable in a PR, enforced at the gateway.

PII redaction at the edge

Email, phone, SSN, and credit-card patterns are detected before the prompt ever leaves your network, then rehydrated on the response stream.

SSO, SCIM, audit logs

Okta, Azure AD, Google Workspace. Every admin action is logged with actor, timestamp, and signed event ID.

Self-host or hosted, same controls

Run it inside your VPC for maximum isolation, or let us host with single-tenant ingress, dedicated keys, and SOC 2 controls.

SOC 2 Type II, HIPAA-ready, GDPR DPA — security review packs available on request.
Request the security pack
Honest comparison

What you give up by going elsewhere.

The alternatives are good products. They're built around different bets. Here are the dimensions that actually matter when smart routing is the goal.

DimensionGateway-LLMLiteLLMPortkeyBuild it yourself
OpenAI-compatible drop-inMaybe
Smart routing as core featurePluginPaid
Routing decision in <11µsYou build it
Cross-provider failoverYou build it
Per-team budgets & spend capsYou build it
Routing analytics & savings reportingPaidYou build it
Open-source licenseApache 2.0MITClosed core
Self-host on your VPCLimited
Per-token markup$0$0$0$0
SOC 2 / HIPAA hosted planYour problem
LiteLLM

Best if your priority is the absolute longest provider list. Smart routing is a community plugin, not a first-class feature.

Portkey

Strong dashboards. Smart routing and many enterprise primitives are gated behind a paid tier; OSS core is intentionally minimal.

Build it yourself

Cheapest on day one. Most expensive by month six — you'll write the classifier, the failover, the spend reports, the SOC 2 evidence, and the on-call.

FAQ

Frequently asked questions.

Gateway-LLM is written in Go and measures ~50× less overhead per request (under 11µs vs ~500µs). The core feature set — virtual keys, budgets, caching, failover — is open-source and MIT-licensed, same as LiteLLM. If you need every possible provider LiteLLM supports ~100+; we focus on the 20 that cover >99% of production traffic.
Yes. Azure OpenAI, AWS Bedrock, Google Vertex, Mistral, Groq, Together and Fireworks are first-class providers. Add them to model_list in config.yaml with their API credentials and deploy.
The self-hosted gateway is free and open-source under MIT forever, with unlimited requests. Our managed cloud Pro tier starts at $49/month and includes a 14-day free trial — no credit card required.
Yes. Full SSE streaming is supported for /v1/chat/completions and /v1/responses, with automatic translation between OpenAI, Anthropic and Gemini formats. Tool calling and MCP (Model Context Protocol) are supported across all providers that support them.
Create keys via the management API POST /v1/management/keys or the admin UI. Each key can have per-model access controls, RPM/TPM rate limits, and monthly spend caps. Redis-backed rate limits work across horizontally-scaled instances.
Absolutely. We publish a ~20MB multi-stage Docker image and a Helm chart with HPA, PDBs and pre-configured Prometheus ServiceMonitors. PostgreSQL and Redis are the only dependencies.

Ready to take control of your LLM traffic?

Join hundreds of companies that switched to Gateway-LLM. Open-source, MIT-licensed, and ready for production.

Try in 2 minutes
Run a single command locally.
$ npx gateway-llm start
Get started
Star on GitHub
Join our open-source community.
Star us
Join the newsletter
Product updates & AI engineering tips.
Subscribe