Smart routing. Open source. Live now.

One API. Every LLM.
Always the right one.

Connect every model — OpenAI, Anthropic, Google, Llama, your own — and let smart routing pick the cheapest, fastest one that clears your quality bar. On every request, in microseconds.

One-line migration
Multi-provider
Open source
<11µs overhead

Start free See smart routing work

trace abf9e2…

overhead 9.2µs

REDACT

live

›[trace abf9…] email → <pii:email:0>
›[trace abf9…] phone → <pii:phone:0>
›vault: in-memory, TTL 30s

Every box is an actual piece of running Go code under backend/internal/. Click any stage on the diagram to open its deep-dive.

Most LLM requests don’t need
your most expensive model.

You wired your app to GPT-5.5. It works. But most requests — extractions, classifications, summaries — don’t need it. They could run on models that are 10–20× cheaper, with no visible difference. Multiply that across every request.

Teams save

50–80%

by routing requests to the right model.

Every request hits the same model — even when it shouldn’t.
Trivial requests pay flagship prices.
One provider outage = your product goes down.
No automatic fallback to a different vendor.
Your costs increase without warning.
Pricing changes upstream silently inflate your bill.
New, cheaper models exist — but switching is painful.
Releases, rollbacks and regression testing slow you down.
“We’ll build routing later” never happens.
It becomes a months-long internal project that nobody owns.

This is exactly what smart routing solves.

Before — one model for everything

Summarize document
Extract entities
Classify ticket
Translate text
Generate report

GPT-5.5

flagship only

Expensive. Slow. Single point of failure.

With Gateway-LLM

Summarize document
Extract entities
Classify ticket
Translate text
Generate report

Gateway-LLM

Smart Routing Engine

CostSpeedQualityReliability

gpt-4o-miniFast
Llama 3.1 70BVery Fast
claude-3-haikuCheap
Mixtral 8x22BCheap
Custom ModelPrivate

Lower costs
Pay only for what you need.
Faster responses
Route to the fastest available model.
Higher reliability
Automatic failover across providers.
Future-proof
Adopt new models instantly.

live, in your browser, on your own prompt

Watch a real request get rerouted in front of you.

Type a prompt. Pick a routing mode. We classify it server-side, pick the right model from a five-rung OpenAI ladder ( gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o, gpt-4o-mini) — and on free-first mode, simulated routes to llama-3.3-70b and claude-haiku. Real reply streams in, dollars saved are computed against pinning the same call to the flagship.

3 free runs per browser session · prompts are not logged · keys stay server-side · simulated routes show a canned reply, real ones stream from OpenAI.

free demo quota

3 / 3left

prompt

routing mode

81 chars · ~21 tokens

live gateway telemetryidle

// hit "Send request" to fire a live call.

// the router decides in < 1 µs server-side,

// then streams the chosen provider back to you.

Like what you see? Run the same router in your own stack — same binary, no usage caps.

Start free Read the 60-second QuickStart

Platform capabilities

Explore everything you can do with Gateway-LLM.

Smart routing is the centre of the product. Around it sits the rest of the platform — governance, observability, deployment, and the SDKs that make it drop-in. Tap any card to read the deep-dive.

Smart Routing

Read post

Score each prompt and pick the cheapest model that clears your quality bar.

Learn more

Multi-Provider Failover

Read post

Prefer one provider, fall back to another in microseconds when health drops.

Learn more

Virtual API Keys

Read post

Issue per-team keys with rate limits, model allowlists, and budgets.

Learn more

Semantic Caching

Read post

Cache near-duplicate prompts safely, with a tunable similarity threshold.

Learn more

LiteLLM Migration

Read post

A one-line base_url swap and your existing OpenAI app is on the gateway.

Learn more

Open-Source Architecture

Read post

The router, classifier, and policy engine are all in the open repo.

Learn more

Observability & Audit

Read docs

Prometheus, OpenTelemetry, Datadog, Langfuse — wire it to whatever you run.

Learn more

SDKs & OpenAI Compatibility

Read docs

Drop-in for the OpenAI Python, TypeScript, and Go SDKs. No rewrites.

Learn more

Self-Host & Configuration

Read docs

Single binary, single config file. Run it on your VPC or on-prem.

Learn more

End-to-end in plain English

Three steps. The rest is invisible.

No new SDK to learn, no infrastructure to operate, no model list to maintain in your app code.

STEP 01

Send one request

Point your existing OpenAI client at the gateway URL. Same SDK, same parameters, same response shape. Zero code rewrites in your app.

STEP 02

Gateway scores complexity & policy

A local classifier reads the prompt and assigns a complexity score in microseconds. Your policy rules — region, budget, allowed providers, fallback order — are applied at the same time.

STEP 03

Best model & provider chosen automatically

The cheapest model that clears your quality bar for that bucket gets the call. If it errors or slows, the gateway retries against the next-best provider before your user notices.

Every decision is logged with the score, the chosen model, and the dollars saved — visible in your dashboard the moment the response returns.

Receipts, not promises

Drop-in today. Faster than your network.

Three claims engineering leads ask about: does it work with my stack, is it fast enough to live in the hot path, and can the savings actually be measured.

<11µs

p99 routing overhead

40–70%

median spend reduction

100%

OpenAI SDK compatibility

Compatibility

Drops in where you already are.

Speaks the OpenAI Chat Completions API verbatim. Every SDK that talks to OpenAI talks to Gateway-LLM unchanged — JavaScript, Python, Go, curl. Migrating from LiteLLM or Portkey is a one-command port; routes, virtual keys, and budgets carry over.

Performance

In-process. No extra hop.

Median routing decision: 9.2 µs. p99 under load: 10.8 µs. The classifier runs in the same process as the proxy — no extra network round-trip, no Redis lookup, no Lambda cold start. Reproducible from make bench in the repo.

Benchmarks

Run it on your own logs.

On a replayed customer trace of 12.4M requests, smart routing cut spend 47% versus pinning to gpt-4o, with no measurable change in human-eval quality scores. The trace replay tool ships in the repo so you can run it on your own logs before signing anything.

Honest pricing

We make money when you save money. Not before.

Gateway-LLM is built to cut your model spend, not pad it. The full router, failover, caching, and budgets are open source and free forever. Hosted plans add convenience, analytics, and team controls — never a per-token markup.

Zero per-token markupOne-line migrationBring your own provider keysCancel anytime

Free

Self-hosted

$0forever

The whole product. None of the limits.

Smart Routing — full classifier and tier-based routing on every request
Automatic failover across providers (OpenAI, Anthropic, Google, Mistral, Bedrock, Ollama, your own models)
Response caching with semantic similarity matching
Per-team and per-key budgets with hard cutoffs
OpenAI-compatible endpoint — drop-in for every existing SDK
Apache 2.0 license — read the source, fork it, run it on your VPC

No usage caps. No telemetry phoning home. No "community edition" tax. The hosted version uses the same binary you do.

Get the binary Read the QuickStart

Custom

Hosted, governed, or both

Customannual or % of savings

Your VPC, your contract, your savings.

Managed cloud — we run it, patch it, scale it (or single-tenant in your VPC, on-prem, or air-gapped)
Routing insights & spend dashboard — per-route savings, model mix, anomaly detection
SSO (Okta / Google Workspace / Azure AD), SCIM provisioning, team roles, approval flows
99.9% SLA with credits, named on-call, contractual response times
Verifiable receipts — Ed25519-signed, chained, replayable for audit
Compliance — SOC 2 Type II, HIPAA-ready, GDPR DPA, ISO 27001 in flight
Custom routing policies, written with our team and owned by you

Two ways to pay: a flat annual contract, or a small percentage of the dollars Smart Routing reduces. If we don't reduce your spend, you don't pay. Procurement-friendly: security review packs, vendor questionnaires answered, MSAs negotiated.

Talk to the founders Download the security pack

Feature	Free	Custom
Smart Routing classifier
Cross-provider failover
Response caching
Per-team budgets
OpenAI SDK drop-in
Hosted, managed, autoscaled	—
Routing analytics & spend dashboard	—
SSO, SCIM, team roles	—
SLA	—	99.9% / Custom
Policy as code, signed receipts	Self-host
SOC 2, HIPAA, custom DPA	—
Private / air-gapped deployment	DIY
Pricing model	$0 forever	% of savings or annual

Pricing FAQ

Three things buyers always ask.

Because Smart Routing only works if every team can trust how it makes decisions. The classifier, the policy engine, the failover logic — all of it is in the open repo, under Apache 2.0, with no usage limits and no “community edition” carve-outs. We sell convenience and governance on top. We don’t sell the router itself.

Most teams send every prompt to a flagship model just in case. But the majority of LLM traffic — extractions, classifications, short summaries — runs perfectly on a cheaper tier. Gateway-LLM scores each prompt’s complexity in microseconds and routes the easy ones to mini-tier models, while keeping hard prompts on the flagship. On replayed customer traces we see 40–70% reduction in model spend with no measurable change in quality scores. We never add a per-token markup, so the savings stay yours — you only pay us for hosted-plan convenience and observability.

Move to Custom when one of these is true:

You don’t want to operate the gateway yourself, but you want the savings now.
You need spend visibility per team, per route, per customer — without writing your own analytics pipeline.
Your team needs SSO, RBAC, and audit trails before security will sign off on production rollout.
You want a 99.9% SLA, named on-call, and signed receipts for audit instead of a Discord channel.
Procurement is asking for a SOC 2 report, MSA, or single-tenant / on-prem deployment.

If you’re a solo developer or a small team running internal tools, stay on Free. The router is the same binary; the only thing you give up is the dashboard, the SLA, and the procurement-friendly paperwork.

Smart routing pays for itself.

Run the open-source binary today. Wire up Pro the day spend visibility starts costing you more time than it would cost in a subscription. Talk to us about Enterprise the day procurement asks for a SOC 2 report.

Start free Talk to the founders Read the docs

Built for teams under audit

Production-grade controls, on day one.

Security and compliance aren't a roadmap item — they're how the system was designed.

Verifiable receipts

Every response is signed and chained with Ed25519. Replay any historical request, prove its routing decision, and verify nothing was tampered with end-to-end.

Policy as code

Express region-pinning, allowed providers, per-team budgets, and approval flows in version-controlled YAML. Reviewable in a PR, enforced at the gateway.

PII redaction at the edge

Email, phone, SSN, and credit-card patterns are detected before the prompt ever leaves your network, then rehydrated on the response stream.

SSO, SCIM, audit logs

Okta, Azure AD, Google Workspace. Every admin action is logged with actor, timestamp, and signed event ID.

Self-host or hosted, same controls

Run it inside your VPC for maximum isolation, or let us host with single-tenant ingress, dedicated keys, and SOC 2 controls.

SOC 2 Type II, HIPAA-ready, GDPR DPA — security review packs available on request.

Request the security pack

Honest comparison

What you give up by going elsewhere.

The alternatives are good products. They're built around different bets. Here are the dimensions that actually matter when smart routing is the goal.

Dimension	Gateway-LLM	LiteLLM	Portkey	Build it yourself
OpenAI-compatible drop-in				Maybe
Smart routing as core feature		Plugin	Paid
Routing decision in <11µs				You build it
Cross-provider failover				You build it
Per-team budgets & spend caps				You build it
Routing analytics & savings reporting			Paid	You build it
Open-source license	Apache 2.0	MIT	Closed core	—
Self-host on your VPC			Limited
Per-token markup	$0	$0	$0	$0
SOC 2 / HIPAA hosted plan				Your problem

LiteLLM

Best if your priority is the absolute longest provider list. Smart routing is a community plugin, not a first-class feature.

Portkey

Strong dashboards. Smart routing and many enterprise primitives are gated behind a paid tier; OSS core is intentionally minimal.

Build it yourself

Cheapest on day one. Most expensive by month six — you'll write the classifier, the failover, the spend reports, the SOC 2 evidence, and the on-call.

FAQ

Frequently asked questions.

Gateway-LLM is written in Go and measures ~50× less overhead per request (under 11µs vs ~500µs). The core feature set — virtual keys, budgets, caching, failover — is open-source and MIT-licensed, same as LiteLLM. If you need every possible provider LiteLLM supports ~100+; we focus on the 20 that cover >99% of production traffic.

Yes. Azure OpenAI, AWS Bedrock, Google Vertex, Mistral, Groq, Together and Fireworks are first-class providers. Add them to model_list in config.yaml with their API credentials and deploy.

The self-hosted gateway is free and open-source under MIT forever, with unlimited requests. Our managed cloud Pro tier starts at $49/month and includes a 14-day free trial — no credit card required.

Yes. Full SSE streaming is supported for /v1/chat/completions and /v1/responses, with automatic translation between OpenAI, Anthropic and Gemini formats. Tool calling and MCP (Model Context Protocol) are supported across all providers that support them.

Create keys via the management API POST /v1/management/keys or the admin UI. Each key can have per-model access controls, RPM/TPM rate limits, and monthly spend caps. Redis-backed rate limits work across horizontally-scaled instances.

Absolutely. We publish a ~20MB multi-stage Docker image and a Helm chart with HPA, PDBs and pre-configured Prometheus ServiceMonitors. PostgreSQL and Redis are the only dependencies.

Ready to take control of your LLM traffic?

Join hundreds of companies that switched to Gateway-LLM. Open-source, MIT-licensed, and ready for production.

Try in 2 minutes

Run a single command locally.

$ npx gateway-llm start

Get started

Star on GitHub

Join our open-source community.

Star us

Join the newsletter

Product updates & AI engineering tips.

One API. Every LLM.Always the right one.

Most LLM requests don’t need your most expensive model.

Watch a real request get rerouted in front of you.

Explore everything you can do with Gateway-LLM.

Smart Routing

Multi-Provider Failover

Virtual API Keys

Semantic Caching

LiteLLM Migration

Open-Source Architecture

Observability & Audit

SDKs & OpenAI Compatibility

Self-Host & Configuration

Three steps. The rest is invisible.

Drop-in today. Faster than your network.

We make money when you save money. Not before.

Free

Custom

Three things buyers always ask.

Smart routing pays for itself.

Production-grade controls, on day one.

What you give up by going elsewhere.

Frequently asked questions.

Ready to take control of your LLM traffic?

One API. Every LLM.
Always the right one.

Most LLM requests don’t need
your most expensive model.