Mask in the Mirror: Implicit Sparsification

Tom Jacobs; Rebekka Burkholz

Mask in the Mirror: Implicit Sparsification

Tom Jacobs, Rebekka Burkholz

Published: 06 Mar 2026, Last Modified: 06 Mar 2026CPAL 2026 (Recent Spotlight Track) PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sparse Training, Continuous sparsification, Implicit bias, Mirror flow, Time-dependent Bregman function, Regularization, Rich regime

TL;DR: Continuous sparsification shifts the implicit bias from L2 to L1 via a time-varying mirror flow, yielding smoother sparsification. PILoT controls and then turns-off explicit regularization while preserving an implicit L1 bias, improving performance.

Abstract: Continuous sparsification strategies are among the most effective methods for reducing the inference costs and memory demands of large-scale neural networks. A key factor in their success is the implicit $L_1$ regularization induced by jointly learning both mask and weight variables, which has been shown experimentally to outperform explicit $L_1$ regularization. We provide a theoretical explanation for this observation by analyzing the learning dynamics, revealing that early continuous sparsification is governed by an implicit $L_2$ regularization that gradually transitions to an $L_1$ penalty over time. Leveraging this insight, we propose a method to dynamically control the strength of this implicit bias. Through an extension of the mirror flow framework, we establish convergence and optimality guarantees in the context of underdetermined linear regression. Our theoretical findings may be of independent interest, as we demonstrate how to enter the rich regime and show that the implicit bias can be controlled via a time-dependent Bregman potential. To validate these insights, we introduce PILoT, a continuous sparsification approach with novel initialization and dynamic regularization, which consistently outperforms baselines in standard experiments.

Submission Number: 19

Loading