MultimodalOpen Source✦ Free Tier

PaliGemma

Google's OSS vision-language model

3,200 starsApp Infrastructure

About

Google's open-source multimodal model combining SigLIP vision encoder with Gemma LLM. Strong at document understanding, OCR, image captioning, and visual QA. Available via HuggingFace.

Choose PaliGemma when…

  • You need strong OCR and document understanding capabilities
  • You prefer Google's model family and research provenance
  • You want a well-maintained open-weight model from a major lab

Builder Slot

How does your AI see and understand images?Optional for most stacks

Vision-language models for image understanding, captioning, visual QA, and document parsing

Dev Tools
Not applicable
App Infra
Optional
Hybrid
Optional

Other tools in this slot:

Stack Genome Detection

AIchitect's Genome scanner detects PaliGemma in your project via these signals:

pip packages
transformers
env vars
HF_TOKEN

Alternatives to consider (1)

Pricing

✦ Free tier available

Badge

Add to your GitHub README

PaliGemma on AIchitect[![PaliGemma](https://aichitect.dev/badge/tool/paligemma)](https://aichitect.dev/tool/paligemma)

Explore the full AI landscape

See how PaliGemma fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →