Keywords: Optimisation of Neural networks sparsity, Structured Sparsity, Extent Hoyer score, Extent Hoyer score geometry, Fast Hoyer Projection
TL;DR: This paper introduces a novel and efficient projection method based on the extented Hoyer score, designed to induce sparsity in neural networks.
Abstract: Deep networks require sparsity mechanisms that are both scale-invariant and computationally efficient. Existing approaches based on the Hoyer score rely on non-convex projections, resulting in unstable heuristics and potential convergence issues.
In this paper, we introduce a new Cone Alignement Index (CAI), a convex constraint whose level sets form a Lorentz hypercone. This geometric structure enables the first Closed-Form Projection (CFP) onto such a cone, requiring only a single interpolation step and enjoying guaranteed convergence. We derive analytical expressions for:
(i) computing the active set through a provably correct threshold rule, and
(ii) performing the final projection using a closed-form interpolation coefficient.
Building on this result, we propose a fast bilevel projection method, consisting solely of successive Closed-Form Projection (CFP) algorithms, with guaranteed convergence and naturally inducing hardware-friendly column (or row)-wise sparsity.
Thanks to these Closed-Form Projection (CFP) algorithms, our method is up to 6.5 times faster than the original Hoyer projection on the vector. Our bilevel Closed-Form Projection (CFP) algorithm is 2r times faster than the HALS algorithm on matrices.
Applied to transformer attention matrices on biomedical and NLP dataset (GLUE benchmark), it achieves up to $96 \% $ sparsity with negligible accuracy degradation, outperforming state-of-the-art “universal Big bird " masks.
Overall, this work provides a principled, convex, and scalable alternative to Hoyer-based sparsification, opening the door to energy-efficient LLMs with controllable structured sparsity.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 8853
Loading