Keywords: Synthetic Data Generation, Graph Machine Learning, Generative Models, Diffusion Models
Abstract: Social media bot detection faces persistent data scarcity challenges, as obtaining diverse, high-quality labeled datasets becomes increasingly difficult. We introduce AURA (Augmented User-graph via Reverse-diffusion Architecture), a novel and model-agnostic pipeline that leverages graph diffusion models to generate realistic synthetic social network data for training augmentation. While we demonstrate AURA using GraphMaker, a graph-compatible diffusion architecture, our framework is compatible with any suitable generative model. By combining diffusion-based synthetic graph generation with specialized language models, AURA produces synthetic users enriched with both network structure and textual features. Through systematic evaluation on TwiBot-22 under varying levels of data scarcity, we show that synthetic augmentation via AURA consistently improves bot detection performance, delivering robust gains in accuracy, precision, and recall across all tested sample sizes. This work represents the first application of graph diffusion models to social media bot detection and establishes synthetic data generation as a promising direction for overcoming labeled data scarcity in this domain, with preliminary results suggesting increasing effectiveness as graph generation capabilities scale.
Submission Number: 68
Loading