On the Generalization of Dynamic GNNs: A Heavy-Tailed Wavelet Perspective

On the Generalization of Dynamic GNNs: A Heavy-Tailed Wavelet Perspective

ICLR 2026 Conference Submission15857 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: dynamic F

Abstract: Dynamic graphs exhibit bursty and intermittent dynamics that are poorly captured by standard sequence models. We take a signal–statistical view and show that node-wise temporal signals, once transformed into wavelet space, display Pareto-type heavy tails: a small set of high-magnitude coefficients concentrates a large fraction of the total energy. Building on this observation, we introduce Tail-Aware Masking for Dynamic GNNs (DGNNs): a simple, plug-in mechanism that retains only the top wavelet coefficients (per node) and zeros out the rest before message passing. On the theory side, under a mild regularly varying tail assumption with index $\alpha>2$, we prove that (i) the retained coefficients capture a constant fraction of energy scaling as $\rho^{1-2/\alpha}$ for retention ratio $\rho$, (ii) masking increases an effective tail index of the features, and (iii) the empirical Rademacher complexity and the generalisation gap of the resulting hypothesis class contract at rate $\mathcal{O}\!\big(\rho^{\frac{1}{2}-\frac{1}{\alpha}}/\sqrt{nT}\big)$. These results formalise why sparse, tail-focused representations improve sample efficiency. Empirically, on METR-LA we observe clear heavy tails via survival curves and Q–Q plots, validating the modelling prior. Our tail-aware DGNN consistently outperforms its baseline counterpart, yielding substantial reductions in MSE and gains on tail-sensitive metrics, while maintaining training stability through a short warmup. The approach is architecture-agnostic, interpretable (the mask exposes the most informative time–node events), and requires minimal tuning. Together, our findings connect a robust statistical phenomenon of dynamic graph signals to concrete architectural choices and provable generalisation benefits.

Primary Area: learning theory

Submission Number: 15857

Loading