Position: LLMs Need a Bayesian Meta-Reasoning Framework for More Robust and Generalizable Reasoning

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 Position Paper Track posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) excel in many reasoning tasks but continue to face significant challenges, such as lack of robustness in reasoning, struggling with cross-task generalization, and inefficiencies in scaling up reasoning capabilities. Current training paradigms, including next-token prediction and reinforcement learning from human feedback, often fall short in adaptability to diverse reasoning tasks. Existing approaches, such as prompt optimization and iterative output refinement, offer performance improvement, but can be inefficient and lack effective generalization. To overcome these limitations, this position paper argues for a transformative shift in how LLMs approach reasoning. Drawing inspiration from cognitive science, particularly meta-reasoning theories such as Dual-Process Theory and Metacognitive Reasoning, we propose a Bayesian meta-reasoning framework for LLMs. Our approach integrates self-awareness, monitoring, evaluation, regulation, and meta-reflection, to enhance LLMs' ability to refine reasoning strategies and generalize across tasks. We revisit existing LLM reasoning methods, identify key challenges, and suggest directions for future research.
Lay Summary: Large language models (LLMs) are getting better at solving different reasoning problems. But they still stumble when asked to reason clearly, adapt to new kinds of problems, or explain their thinking. Why does this happen, and how can we fix it? In this work, we take inspiration from how humans reason — especially from psychology theories about how people monitor and adjust their own thinking. We argue that LLMs should be trained not just to give answers, but also to reflect on how they’re reasoning, much like a person might double-check their logic or change strategies when something feels off. We introduce a new framework that helps LLMs become more self-aware and adaptive by borrowing ideas from cognitive science, such as the “dual-process theory” of fast vs. slow thinking. This framework encourages models to evaluate their own thought processes, regulate their responses, and learn how to generalize their reasoning across a wide range of tasks. Our goal is to spark a shift in how AI systems learn to reason — moving from static answering machines toward dynamic thinkers that can flexibly solve problems and explain their own reasoning. We've also shared tools and resources to help researchers explore this new approach.
Verify Author Names: My co-authors have confirmed that their names are spelled correctly both on OpenReview and in the camera-ready PDF. (If needed, please update ‘Preferred Name’ in OpenReview to match the PDF.)
No Additional Revisions: I understand that after the May 29 deadline, the camera-ready submission cannot be revised before the conference. I have verified with all authors that they approve of this version.
Pdf Appendices: My camera-ready PDF file contains both the main text (not exceeding the page limits) and all appendices that I wish to include. I understand that any other supplementary material (e.g., separate files previously uploaded to OpenReview) will not be visible in the PMLR proceedings.
Latest Style File: I have compiled the camera ready paper with the latest ICML2025 style files <https://media.icml.cc/Conferences/ICML2025/Styles/icml2025.zip> and the compiled PDF includes an unnumbered Impact Statement section.
Paper Verification Code: ZmVlY
Link To Code: https://github.com/hanqi-qi/LLM_MetaReasoning
Permissions Form: pdf
Primary Area: Research Priorities, Methodology, and Evaluation
Keywords: Large language models, reasoning, meta-reasoning
Submission Number: 199
Loading