Deep unsupervised feature selection

Ian Covert; Uygar Sumbul; Su-In Lee

Deep unsupervised feature selection

Ian Covert, Uygar Sumbul, Su-In Lee

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Single cell rna, microarray, feature selection, feature ranking

TL;DR: To perform well in downstream prediction tasks, features are selected by learning a "restricted autoencoder" that iteratively eliminates features that aren't necessary for accurate reconstruction.

Abstract: Unsupervised feature selection involves finding a small number of highly informative features, in the absence of a specific supervised learning task. Selecting a small number of features is an important problem in many scientific domains with high-dimensional observations. Here, we propose the restricted autoencoder (RAE) framework for selecting features that can accurately reconstruct the rest of the features. We justify our approach through a novel proof that the reconstruction ability of a set of features bounds its performance in downstream supervised learning tasks. Based on this theory, we present a learning algorithm for RAEs that iteratively eliminates features using learned per-feature corruption rates. We apply the RAE framework to two high-dimensional biological datasets—single cell RNA sequencing and microarray gene expression data, which pose important problems in cell biology and precision medicine—and demonstrate that RAEs outperform nine baseline methods, often by a large margin.

Original Pdf: pdf

4 Replies

Loading