Keywords: PAC-Bayes, intrinsic dimension, low-rank representations, Fisher information, dynamic architectures, pruning, generalization, spectral analysis
TL;DR: We prove that the generalization error of dynamic neural architectures scales with the intrinsic dimension of Fisher information not the parameter count and show that Fisher-score pruning provably reduces this dimension.
Abstract: We study the generalization of neural networks whose architectures are modified at training time via pruning, quantization, or width expansion. Extending the PAC-Bayes framework of MacAulay et al. (2023) to this dynamic setting, we prove that the generalization error scales with the *intrinsic dimension* $d_{\mathrm{int}}$ of the Fisher information at the final iterate — not the ambient parameter count $p$. Our central result (Theorem 3.3) provides a $\widetilde{O}\!\bigl(\sqrt{d_{\mathrm{int}} / n}\,\bigr)$ upper bound, which is near-tight for a linear model class; the gap for deep neural networks remains open. A matching minimax lower bound (Theorem 3.4) confirms that this dependence is unavoidable. We further prove that Fisher-score pruning provably reduces the intrinsic dimension under spectral-gap conditions (Theorem 4.1), yielding a conditional end-to-end improvement that requires stability Assumptions (C1)–(C3) and a spectral gap. All results are purely theoretical — no experimental claims are made. We provide a theoretical analysis identifying the conditions under which the theory predicts (or fails to predict) generalization improvement, including expansion–contraction cycles and spectra without a clear spectral gap.
Submission Number: 1
Loading