Self-supervised Masked Graph Autoencoder via Structure-aware Curriculum

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Abstract: Self-supervised learning (SSL) on graph-structured data has attracted considerable attention recently. Masked graph autoencoder, as one promising generative graph SSL approach that aims to recover masked parts of the input graph data, has shown great success in various downstream graph tasks. However, existing masked graph autoencoders fail to consider the degree of difficulty of recovering the masked edges that often have different impacts on the model performance, resulting in suboptimal node representations. To tackle this challenge, in this paper, we propose a novel curriculum based self-supervised masked graph autoencoder that is able to capture and leverage the underlying degree of difficulty of data dependencies hidden in edges, and design better mask-reconstruction pretext tasks for learning informative node representations. Specifically, we first design a difficulty measurer to identify the underlying structural degree of difficulty of edges during the masking step. Then, we adopt a self-paced scheduler to determine the order of masking edges, which encourages the graph encoder to learn from easy to difficult parts. Finally, the masked edges are gradually incorporated into the reconstruction pretext task, leading to high-quality node representations. Experiments on several real-world node classification and link prediction datasets demonstrate the superiority of our proposed method over state-of-the-art graph self-supervised learning baselines. This work is the first study of curriculum strategy for masked graph autoencoders, to the best of our knowledge.
Lay Summary: Graphs such as social networks and citation maps are common in the real world. Teaching AI models to understand them without relying on human-annotated labels is a key challenge. One popular approach trains models by hiding parts of a graph and asking them to predict what is missing. However, it often treats all missing parts as equally difficult, which can limit learning effectiveness. We propose a new method that improves model training by starting with easier tasks and gradually progressing to more difficult ones. Our method estimates how challenging each part of the graph is to recover and adjusts the training schedule accordingly. This progressive learning strategy helps the model build better representations of graph data. Across multiple real-world benchmarks, our method outperforms existing techniques on tasks such as node classification and link prediction. To our knowledge, this is the first work to introduce curriculum-based training in self-supervised learning for graph models.
Primary Area: General Machine Learning->Representation Learning
Keywords: Curriculum Learning
Submission Number: 9548
Loading