Reversal Is Structural: Concept-Aware Post-Training Recovers Rare, Deep Mathematical Skills

Published: 16 Oct 2025, Last Modified: 12 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, Post-Training, Self-improvements, Mathemtical skills
Abstract: Solution-based post-training, ranging from self-taught rationales to iterative preference learning, is credited with improving mathematical problem solving in large language models. Recent evidence, however, shows a "self-improvement reversal": as *pass@1* rises, models can lose breadth and robustness. We argue this is structural, not a statistical quirk. Drawing on Knowledge Space Theory, we recast mathematical reasoning as movement over a prerequisite concept graph and induce a sparse problem-concept mapping via an automatic pipeline, **AutoKST**, enabling concept-aware diagnostics beyond accuracy. Applied to challenging math benchmarks, **AutoKST** reveals that regressions localize to the graph’s *fringe*, rare, prerequisite-heavy skills that headline scores overlook. A linearized view of post-training explains why frequency-skewed updates with coupled gradients naturally drift away from these low-frequency directions. Guided by this account, we propose *Fringe-Theorem Training* (FTT): a lightweight regimen that combines frequency-aware loss reweighting, projection-based gradient safeguards, and a fringe-focused micro-curriculum. In controlled studies, FTT improves *pass@1* while restoring fringe competence and prerequisite adherence; for example, versus STaR it raises fringe performance and coverage substantially, improves consistency, lowers calibration error, and, paired with early stopping, reduces reasoning tokens. By turning post-training evaluation into concept-level measurement, our framework distinguishes genuine self-improvement from structural regress and offers a practical path to the former.
Submission Number: 296
Loading