Keywords: feature attribution, probing
Abstract: Model quantization is essential for deploying large language models (LLMs). However, quantized models often exhibit unpredictable failures in instruction following, including unintended language switching, violation of formatting constraints, and degenerative generation.
We present \textbf{Deep Attention Stimulation (DAS)}, a training-free intervention that selectively compensates attention heads most disrupted by quantization. Inspired by targeted neural stimulation in cognitive neuroscience, DAS identifies critical attention heads by analyzing activation differences between full-precision and quantized models on instruction-following failure cases. Through qualitative analysis on 10 carefully selected samples from IFEval, we show that injecting small corrective signals into these heads can recover instruction-following behavior.
In particular, a 4-bit GPTQ Qwen2.5-7B-Instruct model recovers correct English output and avoids repetitive degeneration under moderate stimulation. Our pilot study suggests that quantization-induced instruction-following failures are localized to a small subset of attention heads rather than uniformly distributed. These findings highlight the potential of interpretable and targeted post-quantization repair mechanisms.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Interpretability and Analysis of Models for NLP, Efficient/Low-Resource Methods for NLP
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 7892
Loading