Sparse Labels Node Classification: Unsupervised Learning for Mentoring Supervised Learning in Sparse Label Settings

17 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Graphs; Semi-Supervised Node Classification; Unsupervised learning; Clustering; Sparse Labels Setting
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Estimating Label Information (ELI) for Sparse Labels Node Classification (SLNC).
Abstract: Despite their huge success, Graph Neural Networks (GNNs) still require lots of labeled examples (per class) at training time in order to perform well on the Semi-Supervised Node Classification (SSNC) task. This is a major drawback since labels are usually expensive and time-consuming to get. Though several attempts have been made to address this problem, most attempts still require; a significant amount of labeled examples for at least some classes (considered base classes), as well a minimum amount of labels per class (for other classes). In this work, we attempt to alleviate these hard requirements. Our problem thus differs from the traditional SSNC settings in the sense that in this work we try to address the setting in which we only have extremely few labeled nodes seen at training time, and in addition, these labeled nodes are not provided (chosen) on a per-class basis. We name this task Sparse Labels Node Classification (SLNC). To address this problem, we Estimate Label Information (ELI) from a pseudo space by leveraging unsupervised learning techniques. We use this estimated label information to enhance reformulations of well-known semi-supervised learning (SSL) frameworks, as well as guide the labeled nodes selection process for training. We show that our approach outperforms baselines on SLNC by 10-20% when the number of labeled nodes seen at training is extremely few.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 869
Loading