CoT-MT$^3$ : CoT-Guided Meta Test-Time Training for Multimodal Reasoning

CoT-MT$^3$ : CoT-Guided Meta Test-Time Training for Multimodal Reasoning

ICLR 2026 Conference Submission2428 Authors

05 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Test-Time Training, Meta Learning, Multimodal Reasoning

Abstract: Large Multimodal Models (LMMs) have achieved remarkable results across various tasks, but they still face challenges in complex multimodal reasoning that is typically performed via chain-of-thought (CoT). Recent studies also start to explore the retrieval-augmented few-shot setting to alleviate this problem. However, existing methods still lack tailored retrieval strategy and effective utilization of demonstrations in complex reasoning scenarios, resulting in limited reasoning improvements. In this paper, we introduce a novel framework, termed CoT-Guided Meta Test-Time Training (CoT-MT$^3$), to enhance LMMs' few-shot multimodal reasoning ability by employing a CoT-guided Weighted Retrieval (CWR) strategy and a Meta Test-Time Training (MT$^3$) paradigm. To provide more relevant demonstrations, CWR employs a retrieval-specific CoT to highlight key information and deep reasoning of the test query for problem-solving. Retrieval is then performed based on the weighted similarity of both the original query and the derived CoT cues. Moreover, to fully leverage retrieved demonstrations, MT$^3$ introduces a context-based meta-learning paradigm by constructing multiple training samples per query with varying context sizes and combinations using few-shot demonstrations. Experiments across three benchmarks show that our CoT-MT$^3$ achieves a significant relative improvement of up to 4.82\% on MathVerse and 8.38\% on We-Math in the 4-shot setting. Notably, we also observe that our CoT-MT$^3$ demonstrates exceptional robustness across different context sizes, highlighting its effectiveness and generalization to few-shot reasoning scenarios.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 2428

Loading