HEART-ViT: HESSIAN-GUIDED EFFICIENT DYNAMIC ATTENTION AND TOKEN PRUNING IN VISION TRANSFORMERS

ICLR 2026 Conference Submission22556 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision Transformers (ViTs), Dynamic pruning, Hessian-based sensitivity, Token and head pruning, Edge-efficient inference
Abstract: Vision Transformers (ViTs) deliver state-of-the-art accuracy but their quadratic at- tention cost and redundant computations severely hinder deployment on latency- and resource-constrained platforms. Existing pruning approaches treat either tokens or heads in isolation, relying on heuristics or first-order signals, which often sacrifice accuracy or fail to generalize across inputs. We introduce HEART- ViT, a Hessian-guided efficient dynamic attention and token pruning for vision transformers, which to the best of our knowledge, is the first unified, second-order, input-adaptive framework for ViT optimization. HEART-ViT estimates curvature-weighted sensitivities of both tokens and attention heads using efficient Hessian–vector products, enabling principled pruning decisions under explicit loss budgets. This dual-view sensitivity reveals an important structural insight: token pruning dominates computational savings, while head pruning provides fine-grained redundancy removal, and their combination achieves a superior trade-off. On ImageNet-100 and ImageNet-1K with ViT-B/16 and DeiT-B/16, HEART-ViT achieves up to 49.4% FLOPs reduction, 36% lower latency, and 46% higher throughput, while consistently matching or even surpassing baseline accuracy after fine-tuning (e.g., +4.7% recovery at 40% token pruning). Beyond theoretical benchmarks, we deploy HEART-ViT on different edge devices, like- AGX Orin, demonstrating that our reductions in FLOPs and latency translate directly into real-world gains in inference speed and energy efficienc. HEART-VIT bridges the gap between theory and practice, delivering the first unified, curvature-driven pruning framework that is both accuracy-preserving and edge-efficient.
Primary Area: optimization
Submission Number: 22556
Loading