Abstract: This paper introduces two novel criteria: one for feature selection and another for feature elimination in the context of best subset selection, which is a benchmark problem in statistics and machine learning. From the perspective of optimization, we revisit the classical selection and elimination criteria in traditional best subset selection algorithms, revealing that these classical criteria capture only partial variations of the objective function after the entry or exit of features. By formulating and solving optimization subproblems for feature entry and exit exactly, new selection and elimination criteria are proposed, proved as the optimal decisions for the current entry-and-exit process compared to classical criteria. Replacing the classical selection and elimination criteria with the proposed ones generates a series of enhanced best subset selection algorithms. These generated algorithms not only preserve the theoretical properties of the original algorithms but also achieve significant meta-gains without increasing computational cost across various scenarios and evaluation metrics on multiple tasks such as compressed sensing and sparse regression.
Lay Summary: Selecting the most important features in high-dimensional data is a fundamental challenge in statistics and machine learning. Best subset selection is considered the gold standard for feature selection, but the problem is NP-hard. Many polynomial-time approximation algorithms have been proposed to solve the best subset selection problem, which use classic criteria to select/eliminate features, but these criteria only partially capture how each feature affects model performance.
We revisited this problem through an optimization lens and discovered that existing criteria overlook interactions between features. By mathematically modeling the exact impact of adding/removing features as a subset, rather than just individually, we developed two novel criteria: one for feature selection and one for elimination.
When integrated into existing algorithms, our criteria boost performance almost without increasing computational cost. They preserve theoretical guarantees while consistently improving accuracy across diverse tasks, scenarios and evaluation metrics. Both theoretical analysis and experimental results demonstrate the significant advantage of the new criteria in scenarios with high feature correlation. This provides researchers fundamentally new perspectives for algorithm design in the field of best subset selection. The idea of optimal pursuit also has potential applications in a wider range of machine learning scenarios and tasks.
Link To Code: https://github.com/ZhihanZhu-math/Optimal_Pursuit_public
Primary Area: General Machine Learning
Keywords: Best Subset Selection, Feature Selection and Elimination, Optimal Criteria, Compressed Sensing, Sparse Regression
Submission Number: 3211
Loading