Open-source multimodal LLM assistant
Large Language and Vision Assistant — connects a vision encoder to an LLM for instruction-following with images. OSS research model widely used as a multimodal base. Runs via Ollama.
Vision-language models for image understanding, captioning, visual QA, and document parsing
Other tools in this slot:
Add to your GitHub README
[](https://www.aichitect.dev/tool/llava)Explore the full AI landscape
See how LLaVA fits into the bigger picture — browse all 207 tools and their relationships.