ProxyThinker: Test-Time Guidance through Small Visual Reasoners

Zilin Xiao; Jaywon Koo; Siru Ouyang; Jefferson Hernandez; Yu Meng; Vicente Ordonez

ProxyThinker: Test-Time Guidance through Small Visual Reasoners

Zilin Xiao, Jaywon Koo, Siru Ouyang, Jefferson Hernandez, Yu Meng, Vicente Ordonez

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: decoding-time algorithms, visual reasoning

TL;DR: ProxyThinker enables large vision-language models to inherit slow-thinking visual reasoning skills from smaller RFT models at inference time

Abstract: Recent advancements in reinforcement learning with verifiable rewards have pushed the boundaries of the visual reasoning capabilities in large vision-language models (LVLMs). However, training LVLMs with reinforcement fine-tuning (RFT) is computationally expensive, posing a significant challenge to scaling model size. In this work, we propose ProxyThinker, an inference-time technique that enables large models to inherit the visual reasoning capabilities from small, slow-thinking visual reasoners without any training. By subtracting the output distributions of base models from those of RFT reasoners, ProxyThinker modifies the decoding dynamics and successfully elicits the slow-thinking reasoning demonstrated by the emerged sophisticated behaviors such as self-verification and self-correction. ProxyThinker consistently boosts performance on the challenging visual benchmarks on mathematical and multi-disciplinary reasoning, enabling untuned base models to compete with the performance of their full-scale RFT counterparts. Code is available at https://github.com/MrZilinXiao/ProxyThinker.

Submission Number: 53

Loading