LLM InfrastructureOpen Source✦ Free Tier

llama.cpp

C++ LLM inference for local and edge deployment

⭐ 68,000 stars● Health 95/100 — Active· commit recency (40 pts) · star momentum (30 pts) · issue ratio (20 pts) · forks (10 pts)Dev Productivity & App Infrastructure

Open in Builder →Website ↗GitHub ↗

About

Highly optimized C++ inference engine for running quantized LLMs on CPU and GPU. The foundation for Ollama and many local AI tools.

Choose llama.cpp when…

•You want maximum efficiency for local LLM inference
•You're running models on CPU or edge hardware
•Quantized model performance is your optimization target

Builder Slot

Where do your models actually run?Required for most stacks

LLM providers and inference servers — where the actual model computation happens

Dev Tools

Not applicable

App Infra

Required

Hybrid

Required

Other tools in this slot:

Ollama vLLM Groq Together AI Fireworks AI Replicate HuggingFace Mistral API +14 more

Stack Genome Detection

AIchitect's Genome scanner detects llama.cpp in your project via these signals:

pip packages

llama-cpp-python

config files

Modelfile

Integrates with (1)

RunPodLLM Infrastructure

llama.cpp runs on RunPod CPU/GPU pods, serving quantized models via its built-in HTTP server mode.

→ Run large quantized models on affordable RunPod instances with minimal setup overhead.

Compare →

Often paired with (1)

Ollama

Pricing

✦ Free tier available

Recent Activity

Health ↑ 80 → 95

4 weeks ago

↗

Pricing updated

5 weeks ago

↗

View all activity for this tool →

In 1 stack

Edge / On-Device AI Stack

Ruled out by 1 stack

Edge / On-Device AI Stack

“Ollama already uses llama.cpp under the hood — listing both creates redundancy without adding value.”

Badge

Add to your GitHub README

[![llama.cpp](https://www.aichitect.dev/badge/tool/llama-cpp)](https://www.aichitect.dev/tool/llama-cpp)

Explore the full AI landscape

See how llama.cpp fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →