A Fast and Scalable Extented Hoyer Projection for Structured Neural Network Sparsity

Michel Barlaud; Benoit Macq; MARMORAT Jean-Paul; Pierre Remacle; Vincent Callegari

A Fast and Scalable Extented Hoyer Projection for Structured Neural Network Sparsity

Michel Barlaud, Benoit Macq, MARMORAT Jean-Paul, Pierre Remacle, Vincent Callegari

17 Sept 2025 (modified: 05 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Optimisation of Neural networks sparsity, Structured Sparsity, Extent Hoyer score, Extent Hoyer score geometry, Fast Hoyer Projection

TL;DR: This paper introduces a novel and efficient projection method based on the extented Hoyer score, designed to induce sparsity in neural networks.

Abstract: Deep networks require sparsity mechanisms that are both scale-invariant and computationally efficient. Existing approaches based on the Hoyer score rely on non-convex projections, resulting in unstable heuristics and potential convergence issues. In this paper, we introduce a new Cone Alignement Index (CAI), a convex constraint whose level sets form a Lorentz hypercone. This geometric structure enables the first Closed-Form Projection (CFP) onto such a cone, requiring only a single interpolation step and enjoying guaranteed convergence. We derive analytical expressions for: (i) computing the active set through a provably correct threshold rule, and (ii) performing the final projection using a closed-form interpolation coefficient. Building on this result, we propose a fast bilevel projection method, consisting solely of successive Closed-Form Projection (CFP) algorithms, with guaranteed convergence and naturally inducing hardware-friendly column (or row)-wise sparsity. Thanks to these Closed-Form Projection (CFP) algorithms, our method is up to 6.5 times faster than the original Hoyer projection on the vector. Our bilevel Closed-Form Projection (CFP) algorithm is 2r times faster than the HALS algorithm on matrices. Applied to transformer attention matrices on biomedical and NLP dataset (GLUE benchmark), it achieves up to $96 \% $ sparsity with negligible accuracy degradation, outperforming state-of-the-art “universal Big bird " masks. Overall, this work provides a principled, convex, and scalable alternative to Hoyer-based sparsification, opening the door to energy-efficient LLMs with controllable structured sparsity.

Supplementary Material: zip

Primary Area: optimization

Submission Number: 8853

Loading