Universal LLM proxy — 100+ models, one API
OSS proxy that normalizes 100+ LLMs to the OpenAI format. Add routing, fallbacks, caching, and cost tracking in one layer.
A gateway that normalizes calls across providers — one API for all models, with fallbacks
Other tools in this slot:
AIchitect's Genome scanner detects LiteLLM in your project via these signals:
litellmlitellmLITELLM_API_KEYLITELLM_BASE_URLlitellm_config.yamlContinue points to LiteLLM's OpenAI-compatible proxy endpoint, routing all completions and chat through any provider LiteLLM supports.
→ Model flexibility in Continue without changing editor config — swap providers at the LiteLLM level.
Aider accepts LiteLLM's OpenAI-compatible proxy as its model backend via the --openai-api-base flag.
→ Model-agnostic Aider sessions — route to Claude, GPT-4o, Gemini, or local models through one LiteLLM config.
OpenHands routes all LLM calls through LiteLLM, making it model-agnostic — swap Claude, GPT-4o, or a local model via config.
→ Model flexibility for autonomous agents without changing OpenHands code — route to the best model per task type.
Plandex accepts LiteLLM-proxied endpoints as its model backend, routing its multi-step planning to any provider.
→ Model-agnostic long-horizon coding plans — use Claude for reasoning-heavy planning, cheaper models for mechanical steps.
CrewAI routes LLM calls through LiteLLM, making any model accessible to any agent in the crew via a unified API.
→ Mix models across agents in the same crew — one agent on Claude, another on GPT-4o — through one LiteLLM config change.
AutoGen accepts LiteLLM-proxied endpoints as model backends via its model_client configuration.
→ Provider-flexible AutoGen agents — route different agent roles to different models through one LiteLLM config.
LangGraph nodes call LiteLLM-proxied endpoints, routing each node's LLM calls to any provider without changing graph code.
→ Model-agnostic LangGraph agents — route different nodes to Claude, GPT-4o, or local models via one LiteLLM config.
Semantic Kernel accepts LiteLLM's OpenAI-compatible endpoint as a custom connector configuration.
→ Model-agnostic Semantic Kernel agents — swap providers at the LiteLLM layer without changing .NET or Python agent code.
LangChain accepts LiteLLM's OpenAI-compatible endpoint as a drop-in model connector, routing all LLM calls through the proxy.
→ Provider-agnostic LangChain chains — swap between Claude, GPT-4o, and open models by changing one LiteLLM config line.
LlamaIndex accepts LiteLLM's OpenAI-compatible proxy as its LLM backend, routing all generation through any provider.
→ Model-agnostic LlamaIndex pipelines — swap the generation model without touching retrieval or indexing code.
DSPy routes its compilation and inference calls through LiteLLM, enabling multi-provider optimization runs.
→ Model-agnostic DSPy compilation — optimize prompts across Claude, GPT-4o, and open models in one run to find the best-performing variant.
Dify accepts LiteLLM's OpenAI-compatible endpoint as a custom model provider in its model settings.
→ Every model LiteLLM supports becomes available in Dify's no-code workflow builder without touching the workflow itself.
Agno accepts LiteLLM-proxied endpoints as model backends, giving its agents access to any provider through the gateway.
→ Model flexibility across Agno agents — route different agent roles to different models through one LiteLLM config.
LiteLLM sends callback events to Langfuse after every LLM call — one config line captures cost, model, tokens, and latency per request.
→ Per-request observability across every provider in your stack without changing application code.
LiteLLM can route calls through Helicone as a proxy layer or log directly to Helicone's API after each call.
→ Request replay, caching, and Helicone's rate-limit features layered on top of LiteLLM's provider routing.
LiteLLM recognises Ollama's local API and includes its models in the unified provider list alongside cloud providers.
→ Local Ollama models treated identically to cloud providers — route between local and cloud by changing one parameter.
LiteLLM routes to Groq's API using its provider prefix, normalising Groq's interface into the unified format.
→ Ultra-fast inference on latency-sensitive paths — route to Groq for speed, other providers for quality, via one config.
LiteLLM routes to Together AI's inference API, including its open-source model catalogue.
→ Open-source model access at scale via LiteLLM — route cost-sensitive paths to Together AI without changing application code.
LiteLLM connects to a self-hosted vLLM endpoint via its OpenAI-compatible API, treating it as any other provider.
→ Self-hosted GPU inference via vLLM accessible through the same LiteLLM interface as cloud providers — one config for everything.
LiteLLM wraps Anthropic's API with its provider prefix, normalising Claude's API into OpenAI-compatible format.
→ Claude accessible via the same interface as every other provider — swap with a config change, no code modifications.
LiteLLM routes to OpenAI's API natively, treating it as the default provider in its unified format.
→ OpenAI access through LiteLLM's multi-provider interface — add fallbacks, cost controls, and model swapping without touching app code.
LiteLLM routes to Mistral's API using its provider prefix, normalising Mistral's interface into the standard format.
→ Cost-effective European model access through the same LiteLLM config as Claude and GPT-4o.
LiteLLM routes to Cohere's API for both generation and embedding models.
→ Cohere's strong embedding models accessible alongside generation models through one LiteLLM config — useful for RAG pipelines.
LiteLLM routes to Fireworks AI's fast open-source inference API using its provider prefix.
→ High-throughput, low-latency open-source inference via Fireworks AI routed through the same LiteLLM interface as other providers.
The Vercel AI SDK can point to LiteLLM's OpenAI-compatible endpoint as a custom provider, routing all SDK calls through LiteLLM.
→ Provider-agnostic Vercel AI SDK apps — swap between Claude, GPT-4o, and open models at the LiteLLM layer without changing SDK code.
Not Diamond can be layered over LiteLLM, adding intelligent task-based routing on top of LiteLLM's unified provider API.
→ Best-of-both: LiteLLM's provider coverage with Not Diamond's quality-driven routing.
Add to your GitHub README
[](https://aichitect.dev/tool/litellm)Explore the full AI landscape
See how LiteLLM fits into the bigger picture — browse all 207 tools and their relationships.