Multimodal

10 tools

View in Explore graph →Open Builder →

OSSFreePopular

LLaVA

Open-source multimodal LLM assistant

⭐ 22,000

OSSFreePopular

Moondream

Tiny OSS vision language model

⭐ 11,000

OSSFree

Qwen-VL

Alibaba's open-weight vision-language model line (Qwen2.5-VL → Qwen3-VL)

⭐ 15,000

Free

Fal.ai

Fast serverless inference API for image, video, and audio models

⭐ 10,000

OSSFree

InternVL2

Top OSS multimodal model from OpenGVLab

⭐ 7,800

OSSFree

PaliGemma

Google's OSS vision-language model

⭐ 3,200

Pixtral

Mistral's vision-language model — folded into Mistral Small 4 (2026)

Free

Runway Gen-4.5

Frontier video generation with character/scene consistency

Free

Google Veo 3.1

Google's frontier video generation model with native audio

Free

Kling 3.0

Frontier video generation from Kuaishou

Multimodal

LLaVA

Moondream

Qwen-VL

Fal.ai

InternVL2

PaliGemma

Pixtral

Runway Gen-4.5

Google Veo 3.1

Kling 3.0

Other categories