Modeling Chain-of-Thought Degradation in Pruned Language Models: Fidelity and Similarity Analysis for Mathematical Reasoning
Keywords: Chain-of-Thought Reasoning, Mathematical Reasoning, Language Models, Model Fidelity, Similarity Metrics, Model Compression, Pruning
Abstract: Efficient mathematical reasoning under compute and memory constraints is crucial for deploying large reasoning models (LRMs) in real-world applications, especially in resource-constrained environments. We propose a framework to quantify the relationship between model similarity and loss of reasoning fidelity in chain-of-thought (CoT) outputs under pruning. Our approach introduces ASAND, a similarity metric that integrates centered alignment, sparsity-aware structural measures, and adaptive exponential decay to capture non-monotonic changes in reasoning fidelity, a behavior not captured by traditional pruning metrics. Experiments on the Qwen-0.5B model using the GSM8K dataset demonstrate that moderate pruning can improve CoT reasoning performance, while excessive sparsity leads to catastrophic collapse. Correlation analyses show that ASAND outperforms standard similarity metrics, providing the highest predictive power for reasoning fidelity degradation. These findings offer valuable insights for compression-aware deployment of LRMs, enabling efficient reasoning on resource-constrained devices without sacrificing accuracy. To further validate ASAND, we extend our analysis to competition-level mathematical reasoning on the MATH dataset, demonstrating its broader applicability to more complex problem domains.
Submission Number: 26
Loading