LLM Infrastructure

HuggingFace

Open ML model hub and inference platform

⭐ 135,000

Ollama

Run LLMs locally via simple CLI/API

⭐ 90,000

llama.cpp

C++ LLM inference for local and edge deployment

⭐ 68,000

LiteLLM

Universal LLM proxy — 100+ models, one API

⭐ 16,000

OpenRouter

Unified API routing to 200+ LLMs

Pinecone

Managed vector DB service

Popular

Anthropic API

Claude models API by Anthropic

Popular

OpenAI API

GPT-5 era models, embeddings, and Responses API from OpenAI

Google Gemini API

Google's frontier multimodal model API — Gemini Pro and Flash

Amazon Bedrock

AWS managed AI service with access to frontier models from every provider

Popular

Azure OpenAI

OpenAI models hosted on Azure with enterprise compliance and SLAs

Ray

Distributed computing framework for ML workloads

⭐ 33,000

vLLM

High-throughput LLM serving with PagedAttention

⭐ 32,000

Milvus

Distributed vector database built for scale

⭐ 31,000

Redis Vector

In-memory vector search built into Redis — no separate DB needed

⭐ 23,000

Qdrant

High-performance vector DB with filtering

⭐ 20,000

Chroma

Lightweight embedded vector DB for AI apps

⭐ 15,000

pgvector

PostgreSQL extension for vector similarity search

⭐ 13,000

Vercel AI SDK

TypeScript SDK for streaming AI UIs

⭐ 12,000

Weaviate

Cloud-native vector search engine

⭐ 11,000

LanceDB

Serverless vector DB built on Apache Arrow — embedded or cloud

⭐ 5,800

RunPod

Serverless GPU cloud for AI inference and training

⭐ 1,200

Unify

Route prompts to the best model dynamically by cost, speed, or quality

⭐ 800

PortKey

AI gateway with routing, fallbacks, and caching

Groq

Ultra-fast LLM inference via LPU hardware

Together AI

Fast inference API for open-source models

Fireworks AI

Fast inference with function calling and fine-tuning

Modal

Cloud platform for GPU inference and training

Mistral API

Mistral Large, Mistral Small 4 (unified multimodal), and open-weight families

Cohere API

Command and Embed models for enterprise NLP

Replicate

Run open-source ML models via API

Martian

Intelligent model router that picks the right LLM for every request

Not Diamond

AI model router that learns which LLM performs best for your tasks

Cerebras

Wafer-scale chip inference — the fastest LLM API available

DeepInfra

Serverless GPU inference for open-source LLMs at low cost

Baseten

Deploy any ML model as a low-latency production API

Lambda Labs

GPU cloud and API for training and serving AI models

Perplexity API

LLM API with real-time web search grounding built in

turbopuffer

Serverless vector database built for scale — no infrastructure to manage

xAI Grok API

xAI's Grok models with real-time knowledge and strong reasoning

DeepSeek API

High-performance frontier model API at a fraction of the cost

SambaNova Cloud

Fastest LLM inference API — 200+ tokens/sec on Llama 405B

Cloudflare AI Gateway

LLM gateway with caching, analytics, and rate limiting