What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models

ACL ARR 2026 January Submission8797 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large language model, Reinforcement learning, Echo reflection
Abstract: Large Language Models (LLMs) excel in various reasoning tasks, including complex mathematical reasoning. However, when applied to domain-specific tasks, they consistently fail to generate novel insights during the reflection stage. Instead of conducting genuine cognitive refinement, the model tends to mechanically reiterate earlier reasoning steps without introducing new information or perspectives, a phenomenon referred to as “Echo Reflection”. We attribute this behavior to two key defects: (1) Uncontrollable information flow during response generation, which allows premature intermediate thoughts to propagate unchecked and distort final decisions; (2) An imbalance between exploration and exploitation of domain-relevant internal knowledge, leading to repeating earlier findings rather than generating new cognitive insights. Building on these findings, we proposed a novel reinforcement learning method termed Adaptive Entropy Policy Optimization (AEPO). Specifically, the AEPO framework consists of two major components: (1) Reflection-aware Information Filtration, which quantifies the cognitive information flow and prevents the final answer from being affected by earlier bad cognitive information; (2) Adaptive-Entropy Optimization, which dynamically balances exploration and exploitation across different reasoning stages, promoting both reflective diversity and answer correctness. Extensive experiments demonstrate that AEPO consistently achieves state-of-the-art performance over mainstream reinforcement learning baselines across diverse benchmarks. Our code is available at https://anonymous.4open.science/r/AEPO-7F3A.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: Machine Learning for NLP, Question Answering,
Contribution Types: Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 8797
Loading