Pruning at Initialisation through the lens of Graphon Limit: Convergence, Expressivity, and Generalisation

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This work proves pruning masks from Pruning at Initialsation methods converge to graphons at infinite width, enabling derivations of expressivity and generalisation bounds for sparse networks
Abstract: Pruning at Initialisation methods discover sparse, trainable subnetworks before training, but their theoretical mechanisms remain elusive. Existing analyses are often limited to finite-width statistics, lacking a rigorous characterisation of the global sparsity patterns that emerge as networks grow large. In this work, we connect discrete pruning heuristics to graph limit theory via graphons, establishing the *graphon limit of PaI masks*. We introduce a *Factorised Saliency Model* that encompasses popular pruning criteria and prove that, under regularity conditions, the discrete masks generated by these algorithms converge to deterministic bipartite graphons. This limit framework establishes a novel topological taxonomy for sparse networks: while unstructured methods (e.g., Random, Magnitude) converge to homogeneous graphons representing uniform connectivity, data-driven methods (e.g., SNIP, GraSP) converge asymptotically to heterogeneous graphons that encode implicit feature selection. Leveraging this continuous characterisation, we derive two consequences. First, we prove a universal approximation theorem for sparse networks on active coordinate subspaces. Second, under the Graphon-NTK lazy-training regime, we connect the limiting graphon to NTK-style generalisation bounds and introduce a path-density interpretation of how sparse topology can modulate kernel alignment. Our results transform the study of sparse neural networks from combinatorial graph problems into a rigorous framework of continuous operators, offering a new mechanism for analysing expressivity and generalisation in sparse networks.
Lay Summary: Modern neural networks are often much larger than they need to be, so researchers try to remove unnecessary connections before training begins. This is called pruning at initialisation, and it can make models cheaper to train, but it has been unclear what kind of sparse structures these methods actually create. Our work shows that, as networks become very wide, the connection patterns produced by many pruning rules can be described by a simple continuous map, much like replacing a pixelated image with a smooth picture. This map reveals a clear difference between pruning methods: random pruning spreads connections almost uniformly, while data-dependent methods tend to concentrate connections around more informative inputs and neurons. Using this viewpoint, we show why sparse networks can still approximate useful functions when enough connections remain around the important input features. We also connect this structure to standard tools for understanding generalisation, suggesting that better-organised sparse connections can make models learn better. Overall, the paper provides a mathematical language for describing what pruning at initialisation does, not just how many connections it removes. This can help guide the design of more reliable sparse training methods that reduce computation while preserving performance.
Primary Area: Deep Learning->Theory
Keywords: Graphon, NTK, Sparse Neural Network, Pruning
Originally Submitted PDF: pdf
Submission Number: 10491
Loading