Keywords: Feature selection, classification, regression, survival analysis
TL;DR: We develop a fully embedded feature selection method based directly on approximating the $\ell_0$ penalty.
Abstract: Feature selection problems have been extensively studied in the setting of
linear estimation, for instance LASSO, but less emphasis has been placed on
feature selection for non-linear functions. In this study, we propose a method
for feature selection in high-dimensional non-linear function estimation
problems. The new procedure is based on directly penalizing the $\ell_0$ norm of
features, or the count of the number of selected features. Our $\ell_0$ based regularization relies on a continuous relaxation of the Bernoulli distribution, which
allows our model to learn the parameters of the approximate Bernoulli
distributions via gradient descent. The proposed framework simultaneously learns
a non-linear regression or classification function while selecting a small
subset of features. We provide an information-theoretic justification for
incorporating Bernoulli distribution into our approach. Furthermore, we evaluate
our method using synthetic and real-life data and demonstrate that our approach
outperforms other embedded methods in terms of predictive performance and feature selection.
Code: https://anonymous.4open.science/r/6c6fc22c-ec36-4834-89e2-e2606a4ee2ad/
Original Pdf: pdf
9 Replies
Loading