Keywords: Feature Selection, Sparse Centroid-encoder, Non-linear feature Selection, Deep Feature Selection, Multi-class Feature Selection, Iterative Feature Selection
Abstract: We develop a sparse optimization problem for the determination of the total set of features that discriminate two or more classes. This is a sparse implementa- tion of the centroid-encoder for nonlinear data reduction and visualization called Sparse Centroid-Encoder (SCE). We also provide an iterative feature selection al- gorithm that first ranks each feature by its occurrence, and the optimal number of features is chosen using a validation set. The algorithm is applied to a wide vari- ety of data sets including, single-cell biological data, high dimensional infectious disease data, hyperspectral data, image data, and GIS data. We compared our method to various state-of-the-art feature selection techniques, including three neural network-based models (DFS, SG-L1-NN, G-L1-NN), Sparse SVM, and Random Forest. We empirically showed that SCE features produced better classi- fication accuracy on the unseen test data, often with fewer features.
One-sentence Summary: Feature Selection Using Sparse Centroid-encoder.
Supplementary Material: zip
6 Replies
Loading