Keywords: text-attributed graph; imbalanced graph learning; large language models;
Abstract: Real-world graph data often follows long-tailed distributions, making it difficult for Graph Neural Networks to generalize well across both head and tail classes. Recent advances in Vicinal Risk Minimization (VRM) have shown promise in mitigating class imbalance with numeric interpolation; however, existing approaches largely rely on embedding-space arithmetic, which fails to capture the rich semantics inherent in text-attributed graphs.
In this work, we propose our method $\textbf{SaVe-TAG}$ ( $\textbf{S}$emantic-$\textbf{a}$ware $\textbf{V}$icinal Risk Minimization for Long-Tailed $\textbf{T}$ext-$\textbf{A}$ttributed $\textbf{G}$raphs), a novel VRM framework that leverages Large Language Models to perform text-level interpolation, generating on-manifold, boundary-enriching synthetic samples for minority classes. To mitigate the risk of noisy generation, we introduce a confidence-based edge assignment that uses graph topology as a natural filter to ensure structural consistency.
We provide theoretical justification for our method and conduct extensive experiments on benchmarks, showing that our approach consistently outperforms both numeric interpolation and prior long-tailed node classification baselines. Our results highlight the importance of integrating semantic and structural signals for effective learning on text-attributed graphs.
Submission Number: 16
Loading