TFAR: A Training-Free Framework for Autonomous Reliable Reasoning in Visual Question Answering

Zhuo Zhi; Chen Feng; Adam Daneshmend; Mine Orlu; Andreas Demosthenous; Lu Yin; Da Li; Ziquan Liu; Miguel R. D. Rodrigues

TFAR: A Training-Free Framework for Autonomous Reliable Reasoning in Visual Question Answering

Zhuo Zhi, Chen Feng, Adam Daneshmend, Mine Orlu, Andreas Demosthenous, Lu Yin, Da Li, Ziquan Liu, Miguel R. D. Rodrigues

Published: 28 Aug 2025, Last Modified: 28 Aug 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent approaches introduce chain-of-thought (CoT) reasoning to mitigate the challenges, such as hallucination and reasoning deficit in multimodal large language models (MLLMs) and enhance performance. However, existing CoT-based methods often rely on extensive data annotation and training. To overcome these limitations, we propose a training-free framework for autonomous and reliable reasoning (TFAR), which only uses common lightweight vision tools to improve the reasoning ability of MLLMs. TFAR enables an MLLM to autonomously and accurately identify relevant regions of interest (RoIs) and support CoT reasoning, without requiring additional training or annotations, and with low computational overhead during inference. However, the use of external tools will introduce noise and uncertainty. To mitigate the uncertainty introduced by external tools and select the optimal pathway, we propose a conformal prediction-based uncertainty quantification method that calibrates the outputs from external tools and dynamically selects the most appropriate tool based on the MLLM’s output uncertainty. Experiments across five datasets demonstrate that TFAR improves performance over the base MLLM by an average of 4.6$\%$, in some cases even outperforming fine-tuned baselines, while maintaining low inference cost. These results offer new insights into training-free CoT guidance for MLLMs and underscore the value of reliable visual tools.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: The carema-ready version

Assigned Action Editor: ~Nicolas_THOME2

Submission Number: 5064

Loading