Keywords: feature selection, variable selection, knockoff variables, supervised learning
TL;DR: We propose a feature selection algorithm for supervised learning inspired by the recently introduced knockoff framework for variable selection in statistical regression.
Abstract: We propose a feature selection algorithm for supervised learning inspired by the recently introduced
knockoff framework for variable selection in statistical regression. While variable selection in statistics aims
to distinguish between true and false predictors, feature selection in machine learning aims to reduce the
dimensionality of the data while preserving the performance of the learning method. The knockoff framework
has attracted significant interest due to its strong control of false discoveries while preserving predictive
power. In contrast to the original approach and later variants that assume a given probabilistic model for the
variables, our proposed approach relies on data-driven generative models that learn mappings from data
space to a parametric space that characterizes the probability distribution of the data. Our approach
requires only the availability of mappings from data space to a distribution in parametric space and from
parametric space to a distribution in data space; thus, it can be integrated with multiple popular generative
models from machine learning. We provide example knockoff designs using a variational autoencoder and
a Gaussian process latent variable model. We also propose a knockoff score metric for a softmax classifier
that accounts for the contribution of each feature and its knockoff during supervised learning. Experimental
results with multiple benchmark datasets for feature selection showcase the advantages of our knockoff
designs and the knockoff framework with respect to existing approaches.
Original Pdf: pdf
7 Replies
Loading