STEP3-VL-10B Technical Report

2026年1月14日

93 authors

摘要

We present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish intrinsic vision-language synergy; and second, a scaled post-training pipeline featuring over 1k iterations of reinforcement learning. Crucially, we implement Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute, allocating resources to scalable perceptual reasoning that explores and synthesizes diverse visual hypotheses. Consequently, despite its compact 10B footprint, STEP3-VL-10B rivals or surpasses models 10$\times$-20$\times$ larger (e.g., GLM-4.6V-106B, Qwen3-VL-235B) and top-tier proprietary flagships like Gemini 2.5 Pro and Seed-1.5-VL. Delivering best-in-class performance, it records 92.2% on MMBench and 80.11% on MMMU, while excelling in complex reasoning with 94.43% on AIME2025 and 75.95% on MathVision. We release the full model suite to provide the community with a powerful, efficient, and reproducible baseline.

分类

cs.CV

作者

Ailin HuangChengyuan YaoChunrui HanFanqi WanHangyu GuoHaoran LvHongyu ZhouJia WangJian ZhouJianjian SunJingcheng HuKangheng LinLiang ZhaoMitt HuangSong YuanWenwen QuXiangfeng WangYanlin LaiYingxiu ZhaoYinmin ZhangYukang ShiYuyang ChenZejia WengZiyang MengAng LiAobo KongBo DongChangyi WanDavid WangDi QiDingming LiEn YuGuopeng LiHaiquan YinHan ZhouHanshan ZhangHaolong YanHebin ZhouHongbo PengJiaran ZhangJiashu LvJiayi FuJie ChengJie ZhouJisheng YinJingjing XieJingwei WuJun ZhangJunfeng LiuKaijun TanKaiwen YanLiangyu ChenLina ChenMingliang LiQian ZhaoQuan SunShaoliang PangShengjie FanShijie ShangSiyuan ZhangTianhao YouWei JiWuxun XieXiaobo YangXiaojie HouXiaoran JiaoXiaoxiao RenXiangwen KongXin HuangXin WuXing ChenXinran WangXuelin ZhangYana WeiYang LiYanming XuYeqing ShenYuang PengYue PengYu ZhouYusheng LiYuxiang YangYuyang ZhangZhe XieZhewei HuangZhenyi LuZhimin FanZihui ChengDaxin JiangQi HanXiangyu ZhangYibo ZhuZheng Ge