An Adaptive Scheme of Threshold Adjustment for Dynamic Sparsity Extraction of Self-Attention Network

Published: 2025, Last Modified: 05 Nov 2025AICAS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large Language Models (LLMs) and transformers have become highly successful across various domains. However, they are notorious for their quadratic computational complexity, which increases with sequence length. To mitigate this, dynamic sparsity techniques skip near-zero-output patterns based on low-precision estimations. Values below static thresholds are pruned, reducing energy consumption and improving computation speed.By utilizing low-cost estimations of minor threshold adjustments, we continuously monitor and fine-tune the pruning strategy to avoid overly aggressive pruning. Experimental results demonstrate that the proposed adaptive threshold method provides an average accuracy improvement of 0.15%, along with an average additional 8.95% computational sparsity across the SQuAD v1.1, v2, SST-2, and MRPC datasets.
Loading