Revisiting Sparse Learning for Classification: Comprehensive Comparisons of $L_0$ and $L_1$ Approximations with Early Stopping
Abstract: Understanding the comparative performance of $L_0$ and $L_1$ models is crucial for developing accurate and efficient machine learning systems, particularly in noisy, real-world settings.
The current understanding in the literature is that $L_1$-penalized linear models perform better than $L_0$ models as noise increases.
However, prior studies have largely relied on small and synthetic datasets and limited comparisons between differing optimizers,
while leaving experimentation reflective of practitioner concerns underexplored.
We fill these gaps in analysis by testing multiple different $L_0$ and $L_1$ approximate optimizers on a larger variety of real datasets and using a realistic workflow for a practitioner who at the end of the day values empirical out-of-sample performance.
We demonstrate that empirical performance differences between $L_0$ and $L_1$ models depend significantly on the choice of optimizer and dataset characteristics.
In many cases, the difference in performance by changing the optimization algorithm, while leaving the regularization penalty constant,
is larger than the differences in changing the penalty.
Together, our results show that $L_0$-penalized approximate optimizers with early stopping can remain competitive with $L_1$ models even for noisier datasets and are more viable than previously recognized.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Anastasios_Kyrillidis2
Submission Number: 4264
Loading