Google's OSS vision-language model
Google's open-source multimodal model combining SigLIP vision encoder with Gemma LLM. Strong at document understanding, OCR, image captioning, and visual QA. Available via HuggingFace.
Vision-language models for image understanding, captioning, visual QA, and document parsing
Other tools in this slot:
AIchitect's Genome scanner detects PaliGemma in your project via these signals:
transformersHF_TOKENHealth ↑ 40 → 55
12 days ago
Pricing updated
5 weeks ago
Crossed 1,000 stars ⭐
6 weeks ago
Went stale — 320d without a commit
6 weeks ago
Add to your GitHub README
[](https://www.aichitect.dev/tool/paligemma)Explore the full AI landscape
See how PaliGemma fits into the bigger picture — browse all 207 tools and their relationships.