Shift-Robust Node Classification via Graph Clustering Co-training

Qi Zhu; Chao Zhang; Chanyoung Park; Carl Yang; Jiawei Han

Shift-Robust Node Classification via Graph Clustering Co-training

Qi Zhu, Chao Zhang, Chanyoung Park, Carl Yang, Jiawei Han

Published: 22 Nov 2022, Last Modified: 05 May 2023NeurIPS 2022 GLFrontiers WorkshopReaders: Everyone

Keywords: node classification, graph neural network, domain adaptation, distribution shift

TL;DR: When covariate shift assumption does not hold in graph domain adaptation, we propose a co-training paradigm that unifies multiple domain adaptation perspectives where the key is to learn the hidden cluster structure of the target graph.

Abstract: It is widely known that machine learning models only achieve sub-optimal performance when testing data exhibit distribution shift against training \ie, $\Pr_\text{train}(X,Y) \neq \Pr_\text{test}(X,Y)$. Although Graph Neural Networks (GNNs) have become de facto models for semi-supervised learning tasks, they suffer even more from distribution shift because multiple types of shifts origin from not only node features but graph structures. Existing domain adaptation methods only work for specific type of shifts. In response, we propose Shift-Robust Node Classification (SRNC) - a unified domain adaptation framework for different kinds of distribution shifts on graph. Specifically, we co-train an unsupervised cluster GNN, which captures the data distribution by graph homophily on target graph. Then a shift-robust classifier is optimized on training graph and pseudo samples from target graph, which are provided by cluster GNN. Compared to the existing domain adaptation algorithms on graph, our approach works for both open-set and close-set shifts with convergence guarantees. In our experiments, the classification accuracy is improved at least $3\%$ against the second-best baseline under open-set shifts. On time-evolving graph with close-set shift, existing domain adaption algorithms can barely improve the generalization if not worse. SRNC is still able to mitigate the negative effect ($>2\%$ absolute improvements) of the shift across different testing-times.

1 Reply

Loading