Graph Representation Learning enhanced Semi-supervised Feature Selection

Jun Tan; Zhifeng Qiu; Ning Gui

Graph Representation Learning enhanced Semi-supervised Feature Selection

Jun Tan, Zhifeng Qiu, Ning Gui

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Feature Selection；Graph Representation Learning; Batch Attention

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: A graph-represention enhanced feature selection method

Abstract: Feature selection process is essential in machine learning by discovering the most relevant features to the modeling target. By exploring the potential complex correlations among features of unlabeled data, recently introduced self-supervision-enhanced feature selection greatly reduces the reliance on the labeled samples. However, they are generally based on the autoencoder with sample-wise self-supervision, which can hardly exploit relations among samples. To address this limitation, this paper proposes Graph representation learning enhanced Semi-supervised Feature Selection(G-FS) which performs feature selection based on the discovery and exploitation of the non-Euclidean relations among features and samples by translating unlabeled ``plain" tabular data into a bipartite graph. A self-supervised edge prediction task is designed to distill rich information on the graph into low-dimensional embeddings, which remove redundant features and noise. Guided by the condensed graph representation, we propose a batch-attention feature weight generation mechanism that generates more robust weights according to batch-based selection patterns rather than individual samples. The results show that G-FS achieves significant performance edges in 12 datasets compared to ten state-of-the-art baselines, including two recent self-supervised baselines.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4844

Loading