Evolving Pareto-Optimal Reasoning Paths in LLMs

Published: 15 Mar 2026, Last Modified: 15 Mar 20262026 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: mathematical reasoning, NSGA-II, self-evolve
Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated remarkable capabilities in mathematical reasoning. However, relying solely on greedy decoding or multi-agent often leads to suboptimal reasoning paths, particularly in complex, multi-step problems. In this paper, we propose a novel Self-Evolving Reasoning Framework that treats the generation of reasoning steps as a multi-objective optimization problem. Unlike traditional methods that maximize a scalar reward, our approach utilizes the Non-dominated Sorting Genetic Algorithm II ($NSGA-II$) to navigate the trade-offs between conflicting objectives, such as solution accuracy, logical coherence, and computational efficiency (penalty minimization).The framework generates a population of reasoning paths and iteratively evolves them to approximate the Pareto-optimal front. Experimental results demonstrate that our framework significantly enhances the agent's robustness and adaptability in solving complex mathematical problems
Submission Number: 92
Loading