An Instance-Level Framework for Multi-tasking Graph Self-Supervised Learning

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Graph Self-supervised Learning, Graph Knowledge Distillation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: In this paper, we propose a novel multi-teacher knowledge distillation framework for instance-level multi-tasking graph self-supervised learning.
Abstract: With hundreds of graph self-supervised pretext tasks proposed over the past few years, the research community has greatly developed, and the key is no longer to design more powerful but complex pretext tasks, but to make more effective use of those already on hand. There have been some pioneering works, such as AutoSSL and ParetoGNN, proposed to balance multiple pretext tasks by global loss weighting in the pre-training phase. Despite their great successes, several tricky challenges remain: (i) they ignore instance-level requirements, i.e., different instances (nodes) may require localized combinations of tasks; (ii) poor scalability to emerging tasks, i.e., all task losses need to be re-weighted along with the new task and pre-trained from scratch; (iii) no theoretical guarantee of benefiting from more tasks, i.e., more tasks do not necessarily lead to better performance. To address the above issues, we propose in this paper a novel multi-teacher knowledge distillation framework for instance-level Multi-tasking Graph Self-Supervised Learning (MGSSL), which trains multiple teachers with different pretext tasks, then integrates the knowledge of different teachers for each instance separately by two parameterized knowledge integration schemes (MGSSL-TS and MGSSL-LF), and finally distills it into the student model. Such a framework shifts the trade-off among multiple pretext tasks from loss weighting in the pre-training phase to knowledge integration in the fine-tuning phase, making it compatible with an arbitrary number of pretext tasks without the need to pre-train the entire model from scratch. Furthermore, we theoretically justify that MGSSL has the potential to benefit from a wider range of teachers (tasks). Extensive experiments have shown that by combining a few simple but classical pretext tasks, the resulting performance is comparable to the state-of-the-art competitors.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5122
Loading