Entropy Meets Importance: A Unified Head Importance–Entropy Score for Stable and Efficient Transformer Pruning

MINSIK CHOI; Hyegang Son; Joohun Hyun; Seokmin Kim; Young Geun Kim

Entropy Meets Importance: A Unified Head Importance–Entropy Score for Stable and Efficient Transformer Pruning

MINSIK CHOI, Hyegang Son, Joohun Hyun, Seokmin Kim, Young Geun Kim

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Transformer Architecture, Attention Head Pruning, Model Stability, Attention Entropy

Abstract: Transformer-based models have achieved remarkable performance in NLP tasks. However, their structural characteristics—multiple layers and attention heads—introduce challenges in inference and deployment. To address these challenges, various pruning methods have recently been proposed. Notably, gradient-based methods using Head Importance Scores (HIS) have gained traction for interpretability, efficiency, and ability to identify redundant heads. However, HIS alone has limitations as it captures only the gradient-driven contribution, overlooking the diversity of attention patterns. To overcome these limitations, we introduce a novel pruning criterion, HIES (Head Importance-Entropy Score), which integrates head importance scores with attention entropy, providing complementary evidence on per-head contribution. Empirically, HIES-based pruning yields up to 17.62\% improvement in model quality and $2.05\times$ improvement in stability over HIS-only methods, enabling substantial model compression without sacrificing either accuracy or stability.

Submission Number: 51

Loading