SAINT: Structure-Aware Interpolated Text Augmentation for Imbalanced Node Classification on Text-Attributed Graphs

SAINT: Structure-Aware Interpolated Text Augmentation for Imbalanced Node Classification on Text-Attributed Graphs

ACL ARR 2025 July Submission419 Authors

28 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Imbalanced node classification on text-attributed graphs (TAGs) presents unique challenges due to the scarcity of minority-class nodes and the underutilization of rich textual semantics. While prior works focus on structural augmentation or shallow text features, they often fail to capture deep contextual correlations that Large Language Models (LLMs) naturally encode. In this work, we propose \textbf{SAINT} (Structure-Aware Interpolated Textual augmentation), a novel framework that leverages LLMs for semantic-preserving minority node synthesis while maintaining graph structural coherence via a dual-level augmentation strategy. Specifically, we introduce (1) a \emph{structure-aware textual prompt design} that injects neighborhood semantics into LLM text generation, and (2) a contrastive training scheme for a graph-aware link predictor that better preserves topological properties for synthetic nodes. Theoretically, we analyze the semantic consistency and coverage bounds of LLM-augmented nodes under our prompt design. Empirically, our method significantly outperforms prior data-centric augmentation baselines on five real-world TAG datasets under various imbalance ratios. These results highlight the effectiveness of structure-informed LLM augmentation in long-tail graph learning.

Paper Type: Short

Research Area: Machine Learning for NLP

Research Area Keywords: Machine Learning for NLP, Efficient/Low-Resource Methods for NLP, NLP Applications

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 419

Loading