Keywords: Heterogeneous Graphs, Class Imbalance, Heterogeneous Graph Neural Networks, Oversampling, Classification
TL;DR: HetGSMOTE combats class imbalance in heterogeneous graphs by generating structure-aware synthetic nodes, boosting node classification across models and datasets.
Abstract: Graph Neural Networks (GNNs) have proven effective for learning from graph structured data, with heterogeneous graphs (HetGs) gaining particular prominence for their ability to model diverse real world systems through multiple node and edge types. However, class imbalance where certain node classes are significantly underrepresented presents a critical challenge for node classification tasks on HetGs, as traditional learning approaches fail to adequately handle minority classes. This work introduces HetGSMOTE, a novel oversampling framework that extends SMOTE-based techniques to heterogeneous graph settings by systematically incorporating node-type, edge-type, and metapath information into the synthetic sample generation process. HetGSMOTE operates by constructing a content-aggregated and neighbor-type-aggregated embedding space through a base model, then generating synthetic minority nodes while training specialized edge generators for each node type to preserve essential relational structures. Through comprehensive experiments across multiple benchmark datasets and base models, we demonstrate that HetGSMOTE consistently outperforms existing baseline methods, achieving substantial improvements in classification performance under various imbalance scenarios, particularly in extreme imbalance cases while maintaining broad compatibility across different heterogeneous graph neural network architectures. We release our code and data preparations at [github.com/smlab-niser/hetgsmote](https://github.com/smlab-niser/hetgsmote).
Git: https://github.com/smlab-niser/hetgsmote
Submission Number: 13
Loading