The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs

Jierun Chen, Tiezheng YU, Haoli Bai, Lewei Yao, Jiannan Wu, Kaican Li, Fei Mi, Chaofan Tao, Lei Zhu, Manyi Zhang, Xiao-Hui Li, Lu Hou, Lifeng Shang, Qun Liu

Published: 2026, Last Modified: 17 Mar 2026Trans. Mach. Learn. Res. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading