C++ LLM inference for local and edge deployment
Highly optimized C++ inference engine for running quantized LLMs on CPU and GPU. The foundation for Ollama and many local AI tools.
LLM providers and inference servers — where the actual model computation happens
Other tools in this slot:
AIchitect's Genome scanner detects llama.cpp in your project via these signals:
llama-cpp-pythonModelfileAdd to your GitHub README
[](https://aichitect.dev/tool/llama-cpp)Explore the full AI landscape
See how llama.cpp fits into the bigger picture — browse all 207 tools and their relationships.