Dual Feature Reduction for the Sparse-group Lasso and its Adaptive Variant

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Feature reduction approach for the sparse-group lasso and adaptive sparse-group lasso using strong screening rules for both variables and groups. The two layers of screening allow the fitting time of the models to be greatly reduced.
Abstract: The sparse-group lasso performs both variable and group selection, simultaneously using the strengths of the lasso and group lasso. It has found widespread use in genetics, a field that regularly involves the analysis of high-dimensional data, due to its sparse-group penalty, which allows it to utilize grouping information. However, the sparse-group lasso can be computationally expensive, due to the added shrinkage complexity, and its additional hyperparameter that needs tuning. This paper presents a novel feature reduction method, Dual Feature Reduction (DFR), that uses strong screening rules for the sparse-group lasso and the adaptive sparse-group lasso to reduce their input space before optimization, without affecting solution optimality. DFR applies two layers of screening through the application of dual norms and subdifferentials. Through synthetic and real data studies, it is shown that DFR drastically reduces the computational cost under many different scenarios.
Lay Summary: Discovering which features are important in large datasets, like in genetics, can lead to better predictions and stronger scientific understanding. These discoveries reveal the features that truly drive outcomes, enabling more accurate models and insights. However, identifying these features is challenging, especially when analyzing thousands of them. One popular method, the sparse-group lasso, selects both individual features and groups of them by utilizing grouping information. While powerful, it can be computationally demanding. We introduce Dual Feature Reduction (DFR), a method that speeds up this process without losing any accuracy. DFR applies two layers of mathematical checks to eliminate irrelevant features and groups before the full analysis. This allows the computations to be performed on a small fraction of the total data, leading to very large computational savings. Across synthetic and real datasets, DFR was shown to significantly reduce computation time, making feature selection more efficient and scalable.
Primary Area: General Machine Learning->Scalable Algorithms
Keywords: penalized regression, screening rules, karush–kuhn–tucker, lasso, high-dimensional, sparse-group, feature reduction
Submission Number: 6131
Loading