Mitigating Label Noise on Graphs via Topological Curriculum Learning

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Graph neural networks, Noisy labels, Class-conditional Betweenness Centrality
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Despite success on the carefully-annotated benchmarks, the effectiveness of graph neural networks (GNNs) can be considerably impaired in practice, as the real-world graph data might be noisily labeled. As a promising way to combat label noise, curriculum learning has gained significant attention due to its merit in reducing noise influence via a simple yet effective $\textit{easy-to-hard}$ training curriculum. Unfortunately, the early studies focus on i.i.d data, and when moving to non-iid graph data and GNNs, two notable challenges remain: (1) the inherent over-smoothing effect in GNNs usually induces the under-confident prediction, which exacerbates the discrimination difficulty between easy and hard samples; (2) there is no available measure that considers the graph characteristic to promote informative sample selection in curriculum learning. To address this dilemma, we propose a novel robust measure called $\textit{Class-conditional Betweenness Centrality}$ (CBC), designed to create a curriculum scheme resilient to graph label noise. The CBC incorporates topological information to alleviate the over-smoothing issue and enhance the identification of informative samples. On the basis of CBC, we construct a $\textit{Topological Curriculum Learning}$ (TCL) framework that guides the model learning towards clean distribution. We theoretically prove that TCL minimizes an upper bound of the expected risk under target clean distribution, and experimentally show the superiority of our method compared with state-of-the-art baselines.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1257
Loading