LLaVA

Open-source multimodal LLM assistant

⭐ 22,000 stars● Health 55/100 — Slowing· commit recency (40 pts) · star momentum (30 pts) · issue ratio (20 pts) · forks (10 pts)App Infrastructure

Open in Builder →Website ↗GitHub ↗

About

Large Language and Vision Assistant — connects a vision encoder to an LLM for instruction-following with images. OSS research model widely used as a multimodal base. Runs via Ollama.

Choose LLaVA when…

•You want an open-source multimodal model for self-hosted deployment
•You're doing research on vision-language instruction following
•You need a well-documented baseline for multimodal tasks

Builder Slot

How does your AI see and understand images?Optional for most stacks

Vision-language models for image understanding, captioning, visual QA, and document parsing

Dev Tools

Not applicable

App Infra

Optional

Hybrid

Optional

Other tools in this slot:

Fal.ai Moondream PaliGemma Pixtral Qwen-VL InternVL2 Runway Gen-4.5 Google Veo 3.1 +1 more

Integrates with (1)

OllamaLLM Infrastructure

Ollama bundles LLaVA as a local model, exposing it via an OpenAI-compatible REST endpoint.

→ Run LLaVA vision tasks offline with a single command — no GPU cloud account required.

Compare →

Alternatives to consider (2)

Moondreamcompare →InternVL2compare →

Pricing

✦ Free tier available

Pulse

● No incidents in the last 90 days

Recent Activity

Health ↑ 40 → 55

3 months ago

↗

Pricing updated

4 months ago

↗

View all activity for this tool →

Badge

Add to your GitHub README

[![LLaVA](https://www.aichitect.dev/badge/tool/llava)](https://www.aichitect.dev/tool/llava)

Explore the full AI landscape

See how LLaVA fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →