⚠ This tool appears inactive — no commits in 90+ days. Consider an alternative.
MultimodalOpen Source✦ Free Tier

PaliGemma

Google's OSS vision-language model

3,200 stars● Health 55/100 — Slowing· commit recency (40 pts) · star momentum (30 pts) · issue ratio (20 pts) · forks (10 pts)App Infrastructure

About

Google's open-source multimodal model combining SigLIP vision encoder with Gemma LLM. Strong at document understanding, OCR, image captioning, and visual QA. Available via HuggingFace.

Choose PaliGemma when…

  • You need strong OCR and document understanding capabilities
  • You prefer Google's model family and research provenance
  • You want a well-maintained open-weight model from a major lab

Builder Slot

How does your AI see and understand images?Optional for most stacks

Vision-language models for image understanding, captioning, visual QA, and document parsing

Dev Tools
Not applicable
App Infra
Optional
Hybrid
Optional

Other tools in this slot:

Stack Genome Detection

AIchitect's Genome scanner detects PaliGemma in your project via these signals:

pip packages
transformers
env vars
HF_TOKEN

Alternatives to consider (1)

Pricing

✦ Free tier available

Recent Activity

View all activity for this tool →

Badge

Add to your GitHub README

PaliGemma on AIchitect[![PaliGemma](https://www.aichitect.dev/badge/tool/paligemma)](https://www.aichitect.dev/tool/paligemma)

Explore the full AI landscape

See how PaliGemma fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →