Modeling Chain-of-Thought Collapse in Pruned Language Models: Fidelity and Similarity Analysis for Mathematical Reasoning

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Weight-space similarity, Model compression, pruning, Reasoning faithfulness, Chain-of-thought degradation
Abstract: Efficient mathematical reasoning under compute and memory constraints is crucial for deploying large reasoning models (LRMs) in real-world applications. We propose a framework to quantify the relationship between model similarity and loss of reasoning fidelity in chain-of-thought (CoT) outputs under pruning. Our approach introduces ASAND, a similarity metric that combines centered alignment, sparsity-aware structural measures, and adaptive exponential decay to capture subtle, non-monotonic changes in reasoning fidelity. Experiments on Qwen-0.5B with the GSM8K dataset demonstrate that light pruning can unexpectedly improve CoT reasoning, whereas aggressive sparsity leads to catastrophic collapse. Correlation analyses indicate that ASAND outperforms standard similarity metrics, achieving the highest predictive power for reasoning fidelity degradation. These findings provide actionable insights for compression-aware deployment of LRMs, enabling efficient reasoning on resource-constrained devices without sacrificing correctness. To validate ASAND we extend our analysis of pruning effects on mathematical reasoning from grade-school problems (GSM8K) to competition-level mathematics (MATH dataset).
Submission Number: 82
Loading