Reasoning as Meta-Learning: An Optimization Perspective to Decipher Long CoT Reasoning in LLMs

Reasoning as Meta-Learning: An Optimization Perspective to Decipher Long CoT Reasoning in LLMs

ICLR 2026 Conference Submission20704 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language model reasoning, interpretability, optimization, meta learning, reinforcement learning

TL;DR: We propose a novel framework for interpreting the reasoning capabilities of LLMs through the perspective of meta-learning.

Abstract: We propose a novel framework RaML for interpreting the reasoning capabilities of large language models (LLMs) through the perspective of meta-learning. By conceptualizing reasoning trajectories as pseudo-gradient descent updates to the LLM's parameters, we identify parallels between LLM reasoning and various meta-learning paradigms. We formalize the training process for reasoning tasks as a meta-learning setup, with each question treated as an individual task, and reasoning trajectories serving as the inner loop optimization for adapting model parameters. Once trained on a diverse set of questions, the LLM develops fundamental reasoning capabilities that can generalize to previously unseen questions. Extensive empirical evaluations substantiate the strong connection between LLM reasoning and meta-learning. We further explore the potential of the proposed RaML framework to advance LLM reasoning and provide valuable insights. Our work deepens the understanding of LLM reasoning processes and provides actionable insights for enhancing these models through established meta-learning techniques.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 20704

Loading