Keywords: dynamic F
Abstract: Dynamic graphs exhibit bursty and intermittent dynamics that are poorly captured by standard sequence models. We take a signal–statistical view and show that node-wise temporal signals, once transformed into wavelet space, display Pareto-type heavy tails: a small set of high-magnitude coefficients concentrates a large fraction of the total energy. Building on this observation, we introduce Tail-Aware Masking for Dynamic GNNs (DGNNs): a simple, plug-in mechanism that retains only the top wavelet coefficients (per node) and zeros out the rest before message passing.
On the theory side, under a mild regularly varying tail assumption with index $\alpha>2$, we prove that (i) the retained coefficients capture a constant fraction of energy scaling as $\rho^{1-2/\alpha}$ for retention ratio $\rho$, (ii) masking increases an effective tail index of the features, and (iii) the empirical Rademacher complexity and the generalisation gap of the resulting hypothesis class contract at rate $\mathcal{O}\!\big(\rho^{\frac{1}{2}-\frac{1}{\alpha}}/\sqrt{nT}\big)$. These results formalise why sparse, tail-focused representations improve sample efficiency.
Empirically, on METR-LA we observe clear heavy tails via survival curves and Q–Q plots, validating the modelling prior. Our tail-aware DGNN consistently outperforms its baseline counterpart, yielding substantial reductions in MSE and gains on tail-sensitive metrics, while maintaining training stability through a short warmup. The approach is architecture-agnostic, interpretable (the mask exposes the most informative time–node events), and requires minimal tuning. Together, our findings connect a robust statistical phenomenon of dynamic graph signals to concrete architectural choices and provable generalisation benefits.
Primary Area: learning theory
Submission Number: 15857
Loading