Exploring Neural Scaling Law and Data Pruning Methods For Node Classification on Large-scale Graphs

Zhen WANG; Yaliang Li; Bolin Ding; Yule Li; Zhewei Wei

Exploring Neural Scaling Law and Data Pruning Methods For Node Classification on Large-scale Graphs

Zhen WANG, Yaliang Li, Bolin Ding, Yule Li, Zhewei Wei

Published: 23 Jan 2024, Last Modified: 23 May 2024TheWebConf24 OralEveryoneRevisionsBibTeX

Keywords: Neural scaling law, Node classification, Data pruning

TL;DR: We study neural scaling law and data pruning methods for node classification tasks on large-scale graphs.

Abstract: Recently, how the model performance scales with the training sample size has been extensively studied for large models on vision and language related domains. Nevertheless, the ubiquitous node classification tasks on web-scale graphs were ignored, where the traits of these tasks, such as non-IIDness, semi-supervised setting, and distribution shift, are likely to cause different scaling laws and motivate novel techniques to beat the law. Therefore, we first explore the neural scaling law for node classification tasks on three large-scale OGB datasets. Then, we benchmark several state-of-the-art data pruning methods on these tasks, not only validating the possibility of exploiting data redundancy for improving the original unsatisfactory power law but also gaining valuable insights into a hard-and-representative principle on picking an effective subset of training nodes. Moreover, we leverage the semi-supervised setting of node classification to propose a novel data pruning method, which instantiates our principle in a test set-targeted manner. Our method consistently outperforms related methods on all three datasets. Meanwhile, we utilize a PAC-Bayesian framework to analyze our method, extending prior results to account for both hardness and representativeness. In addition to a promising way to ease GNN training on web-scale graphs, our study offers knowledge of the relationship between training nodes and GNN generalization.

Track: Graph Algorithms and Learning for the Web

Submission Guidelines Scope: Yes

Submission Guidelines Blind: Yes

Submission Guidelines Format: Yes

Submission Guidelines Limit: Yes

Submission Guidelines Authorship: Yes

Student Author: No

Submission Number: 1544

Loading