TL;DR: Novel and efficient end-to-end feature selection for NN. Simple implementation, no alteration of loss function or model architecture, full control over the number of features without tuning or search, outperforms SOTA, theoretical insights provided.
Abstract: Feature selection is a critical step in data-driven applications, reducing input dimensionality to enhance learning accuracy, computational efficiency, and interpretability. Existing state-of-the-art methods often require post-selection retraining and extensive hyperparameter tuning, complicating their adoption. We introduce a novel, non-intrusive feature selection layer that, given a target feature count $k$, automatically identifies and selects the $k$ most informative features during neural network training. Our method is uniquely simple, requiring no alterations to the loss function, network architecture, or post-selection retraining. The layer is mathematically elegant and can be fully described by:
\begin{align}
\nonumber
\tilde{x}_i = a_i x_i + (1-a_i)z_i
\end{align}
where $x_i$ is the input feature, $\tilde{x}_i$ the output, $z_i$ a Gaussian noise, and $a_i$ trainable gain such that $\sum_i{a_i^2}=k$.
This formulation induces an automatic clustering effect, driving $k$ of the $a_i$ gains to $1$ (selecting informative features) and the rest to $0$ (discarding redundant ones) via weighted noise distortion and gain normalization. Despite its extreme simplicity, our method achieves competitive performance on standard benchmark datasets and a novel real-world dataset, often matching or exceeding existing approaches without requiring hyperparameter search for $k$ or retraining. Theoretical analysis in the context of linear regression further validates its efficacy. Our work demonstrates that simplicity and performance are not mutually exclusive, offering a powerful yet straightforward tool for feature selection in machine learning.
Lay Summary: Feature selection is one of the long-standing and important problems in machine learning, artificial intelligence, signal processing, and related fields. It involves identifying the most relevant input variables to improve model performance, reduce complexity, and enhance interpretability. In our work, we propose a simple and efficient method that adds a special layer at the very beginning of a neural network. This layer performs feature selection automatically during training, without changing the loss function or the rest of the architecture. As a result, by the end of training, the network is both trained and the key features are selected. Unlike most existing methods, our approach offers direct control over the number of selected features and does not require any search over additional hyperparameters.
Link To Code: https://github.com/csem/SAND
Primary Area: Deep Learning->Algorithms
Keywords: feature selection, machine learning, neural networks
Submission Number: 723
Loading