Keywords: feature selection, attention
TL;DR: Sequential feature selection using the attention mechanism, with provable guarantees.
Abstract: Feature selection is the problem of selecting a subset of features for a machine learning model
that maximizes model quality subject to a budget constraint.
For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques,
typically select the entire feature subset in one evaluation round,
ignoring the residual value of features during selection,
i.e., the marginal contribution of a feature given that other features have already been selected.
We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results
for neural networks.
This algorithm is based on an efficient one-pass implementation of greedy forward selection
and uses attention weights at each step as a proxy for feature importance.
We give theoretical insights into our algorithm for linear regression
by showing that an adaptation to this setting is equivalent to the
classical Orthogonal Matching Pursuit (OMP) algorithm,
and thus inherits all of its provable guarantees.
Our theoretical and empirical analyses offer new explanations towards the effectiveness of attention
and its connections to overparameterization, which may be of independent interest.
Submission Number: 11
Loading