Abstract: Graphs provide a natural representation for complex relational systems, including biological interaction networks, social networks, and neural connectomes. A central goal in graph learning is to recover meaningful latent structure from observed patterns of connectivity. In practice, however, this problem is rarely clean [1]. Real-world graphs are often sparse, noisy, and only partially observed, so the recorded edges may reflect an imperfect measurement process rather than the true underlying structure [2]. As a result, extracting reliable signals from graph data is fundamentally challenging and highly relevant across many scientific and applied domains [1]. This challenge matters because many of the settings in which graph learning would be most useful are precisely those in which data “quality” is weakest. In areas like biology and finance, collecting graph data can be expensive, technically difficult, and prone to measurement error [2, 3]. In these regimes, model performance is shaped not only by expressive power, but also by robustness to dataset noise and the ability to learn from limited information [4]. In many situations, simpler methods remain competitive because they are more interpretable [5], are less prone to overfitting noise when there is limited data [6], and achieve better calibration [7]. Motivated by this idea, this project studies whether the empirical gains often attributed to GNNs persist under controlled increases in structural and node feature noise on synthetic (LFR and SBM) and real-world datasets, or whether simpler spectral methods become equally effective, or even preferable, when graph data is corrupted. Answering this question is important both scientifically and practically: scientifically, it helps clarify a source of performance in modern graph learning; practically, it helps ensure that the complexity of a model is truly justified by data quality.
Loading