Exploring Neural Scaling Law and Data Pruning Methods For Node Classification on Large-scale Graphs

Published: 23 Jan 2024, Last Modified: 23 May 2024TheWebConf24 OralEveryoneRevisionsBibTeX
Keywords: Neural scaling law, Node classification, Data pruning
TL;DR: We study neural scaling law and data pruning methods for node classification tasks on large-scale graphs.
Abstract: Recently, how the model performance scales with the training sample size has been extensively studied for large models on vision and language related domains. Nevertheless, the ubiquitous node classification tasks on web-scale graphs were ignored, where the traits of these tasks, such as non-IIDness, semi-supervised setting, and distribution shift, are likely to cause different scaling laws and motivate novel techniques to beat the law. Therefore, we first explore the neural scaling law for node classification tasks on three large-scale OGB datasets. Then, we benchmark several state-of-the-art data pruning methods on these tasks, not only validating the possibility of exploiting data redundancy for improving the original unsatisfactory power law but also gaining valuable insights into a hard-and-representative principle on picking an effective subset of training nodes. Moreover, we leverage the semi-supervised setting of node classification to propose a novel data pruning method, which instantiates our principle in a test set-targeted manner. Our method consistently outperforms related methods on all three datasets. Meanwhile, we utilize a PAC-Bayesian framework to analyze our method, extending prior results to account for both hardness and representativeness. In addition to a promising way to ease GNN training on web-scale graphs, our study offers knowledge of the relationship between training nodes and GNN generalization.
Track: Graph Algorithms and Learning for the Web
Submission Guidelines Scope: Yes
Submission Guidelines Blind: Yes
Submission Guidelines Format: Yes
Submission Guidelines Limit: Yes
Submission Guidelines Authorship: Yes
Student Author: No
Submission Number: 1544
Loading