Track: tiny / short paper (up to 5 pages)
Keywords: Chain-of-Thought Reasoning, Mathematical Reasoning, Language Models, Model Fidelity, Similarity Metrics, Model Compression, Pruning
Abstract: Pruning large reasoning models for edge deployment degrades performance in ways that standard accuracy metrics systematically fail to detect. We show that the relationship between sparsity and chain-of-thought (CoT) faithfulness is non-monotonic: light pruning ($\leq 5%$) improves reasoning consistency by removing low-magnitude interference, while sparsity beyond $30\%$ triggers catastrophic collapse of logical coherence. To diagnose this behavior, we present ASAND (Adaptive Sparsity-Adjusted Normalized Distance), a geometry-aware similarity metric that jointly models centered weight alignment, structural sparsity, adaptive exponential decay, and weight-distribution volatility. On Qwen-0.5B evaluated across GSM8K and competition-level MATH problems, ASAND achieves PLCC =0.948 and 0.972 respectively, outperforming cosine similarity, $L_1/L_2$ distances, and CKA.
These results establish sparsity-aware representational geometry as a necessary lens for safe, reasoning-preserving model compression.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 3
Loading