- Keywords: Feature selection, classification, regression, survival analysis
- TL;DR: We develop a fully embedded feature selection method based directly on approximating the $\ell_0$ penalty.
- Abstract: Feature selection problems have been extensively studied in the setting of linear estimation, for instance LASSO, but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in high-dimensional non-linear function estimation problems. The new procedure is based on directly penalizing the $\ell_0$ norm of features, or the count of the number of selected features. Our $\ell_0$ based regularization relies on a continuous relaxation of the Bernoulli distribution, which allows our model to learn the parameters of the approximate Bernoulli distributions via gradient descent. The proposed framework simultaneously learns a non-linear regression or classification function while selecting a small subset of features. We provide an information-theoretic justification for incorporating Bernoulli distribution into our approach. Furthermore, we evaluate our method using synthetic and real-life data and demonstrate that our approach outperforms other embedded methods in terms of predictive performance and feature selection.
- Code: https://anonymous.4open.science/r/6c6fc22c-ec36-4834-89e2-e2606a4ee2ad/