PaliGemma

Google's OSS vision-language model

⭐ 3,200 stars● Health 55/100 — Slowing· commit recency (40 pts) · star momentum (30 pts) · issue ratio (20 pts) · forks (10 pts)App Infrastructure

Open in Builder →Website ↗GitHub ↗

About

Google's open-source multimodal model combining SigLIP vision encoder with Gemma LLM. Strong at document understanding, OCR, image captioning, and visual QA. Available via HuggingFace.

Choose PaliGemma when…

•You need strong OCR and document understanding capabilities
•You prefer Google's model family and research provenance
•You want a well-maintained open-weight model from a major lab

Builder Slot

How does your AI see and understand images?Optional for most stacks

Vision-language models for image understanding, captioning, visual QA, and document parsing

Dev Tools

Not applicable

App Infra

Optional

Hybrid

Optional

Other tools in this slot:

Fal.ai Moondream LLaVA Pixtral Qwen-VL InternVL2 Runway Gen-4.5 Google Veo 3.1 +1 more

Stack Genome Detection

AIchitect's Genome scanner detects PaliGemma in your project via these signals:

pip packages

transformers

env vars

HF_TOKEN

Alternatives to consider (1)

Qwen-VLcompare →

Pricing

✦ Free tier available

Recent Activity

Health ↑ 40 → 55

12 days ago

↗

Pricing updated

5 weeks ago

↗

Crossed 1,000 stars ⭐

6 weeks ago

↗

Went stale — 320d without a commit

6 weeks ago

↗

View all activity for this tool →

Badge

Add to your GitHub README

[![PaliGemma](https://www.aichitect.dev/badge/tool/paligemma)](https://www.aichitect.dev/tool/paligemma)

Explore the full AI landscape

See how PaliGemma fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →