LLM Infrastructure
43 toolsHuggingFace
Open ML model hub and inference platform
Ollama
Run LLMs locally via simple CLI/API
llama.cpp
C++ LLM inference for local and edge deployment
LiteLLM
Universal LLM proxy — 100+ models, one API
OpenRouter
Unified API routing to 200+ LLMs
Pinecone
Managed vector DB service
Anthropic API
Claude models API by Anthropic
OpenAI API
GPT-5 era models, embeddings, and Responses API from OpenAI
Google Gemini API
Google's frontier multimodal model API — Gemini Pro and Flash
Amazon Bedrock
AWS managed AI service with access to frontier models from every provider
Azure OpenAI
OpenAI models hosted on Azure with enterprise compliance and SLAs
Ray
Distributed computing framework for ML workloads
vLLM
High-throughput LLM serving with PagedAttention
Milvus
Distributed vector database built for scale
Redis Vector
In-memory vector search built into Redis — no separate DB needed
Qdrant
High-performance vector DB with filtering
Chroma
Lightweight embedded vector DB for AI apps
pgvector
PostgreSQL extension for vector similarity search
Vercel AI SDK
TypeScript SDK for streaming AI UIs
Weaviate
Cloud-native vector search engine
LanceDB
Serverless vector DB built on Apache Arrow — embedded or cloud
RunPod
Serverless GPU cloud for AI inference and training
Unify
Route prompts to the best model dynamically by cost, speed, or quality
PortKey
AI gateway with routing, fallbacks, and caching
Groq
Ultra-fast LLM inference via LPU hardware
Together AI
Fast inference API for open-source models
Fireworks AI
Fast inference with function calling and fine-tuning
Modal
Cloud platform for GPU inference and training
Mistral API
Mistral Large, Mistral Small 4 (unified multimodal), and open-weight families
Cohere API
Command and Embed models for enterprise NLP
Replicate
Run open-source ML models via API
Martian
Intelligent model router that picks the right LLM for every request
Not Diamond
AI model router that learns which LLM performs best for your tasks
Cerebras
Wafer-scale chip inference — the fastest LLM API available
DeepInfra
Serverless GPU inference for open-source LLMs at low cost
Baseten
Deploy any ML model as a low-latency production API
Lambda Labs
GPU cloud and API for training and serving AI models
Perplexity API
LLM API with real-time web search grounding built in
turbopuffer
Serverless vector database built for scale — no infrastructure to manage
xAI Grok API
xAI's Grok models with real-time knowledge and strong reasoning
DeepSeek API
High-performance frontier model API at a fraction of the cost
SambaNova Cloud
Fastest LLM inference API — 200+ tokens/sec on Llama 405B
Cloudflare AI Gateway
LLM gateway with caching, analytics, and rate limiting