Exploring Feature Selection With Limited Labels: A Comprehensive Survey of Semi-Supervised and Unsupervised Approaches

Guojie Li, Zhiwen Yu, Kaixiang Yang, Mianfen Lin, C. L. Philip Chen

Published: 01 Jan 2024, Last Modified: 15 Nov 2024IEEE Trans. Knowl. Data Eng. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Feature selection is a highly regarded research area in the field of data mining, as it significantly enhances the efficiency and performance of high-dimensional data analysis by eliminating redundant and irrelevant features. Despite the ease of data acquisition, labeling data remains a laborious and expensive task. To leverage the abundance of unlabeled data, researchers have proposed various feature selection methods that operate with limited labels, including semi-supervised feature selection and unsupervised feature selection. However, a comprehensive review encompassing a thorough overview of feature selection algorithms with limited labels is lacking. To bridge this gap, this paper conducts a comprehensive exploration of feature selection methods specifically tailored to limited-label scenarios. These methods are systematically classified into two primary categories: semi-supervised and unsupervised feature selection. Additionally, by introducing a novel taxonomy and discussing future challenges, this survey aims to provide researchers with a comprehensive and in-depth understanding of feature selection in limited-label scenarios. Moreover, it aims to offer valuable insights that can guide further research and development in this domain.