On the Weight Density of L2-Regularized Linear Classification and Regression

Published: 03 Feb 2026, Last Modified: 03 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: For traditional linear models with the widely used $L_2$-regularizer, it is often assumed that the resulting models are dense. As a result, little attention has been paid to when the optimal solution for an $L_2$-regularized problem can actually be sparse. In this work, we rigorously prove that for $L_2$-regularized support vector classification/regression, the theoretical optimum can indeed be sparse when the data have sparse feature values. Surprisingly, we observe that some optimization methods fail to preserve this sparsity and instead produce fully dense numerical solutions, leading to unnecessary storage overhead. We explain this phenomenon through detailed analysis. In particular, we novelly show that certain projected gradient methods for solving the dual problem naturally yields sparser numerical solutions compared to other optimization algorithms. By applying suitable algorithms that preserve numerical sparsity, the storage can be reduced by up to 50%, which is highly advantageous for large-scale industrial applications.
Submission Number: 50
Loading