InternVL2
InternVL2 series from Shanghai AI Lab — consistently top-ranked on open-source multimodal benchmarks. Strong at document understanding, chart analysis, and multi-image reasoning.
Qwen-VL
Qwen Visual Language model series from Alibaba. As of 2026 the frontier OSS multimodal model is Qwen3-VL-235B-A22B-Instruct, which rivals Gemini 2.5 Pro and GPT-5 on visual reasoning. Strong at multilingual visual understanding, document parsing, and chart QA.