A Rescaling-Invariant Lipschitz Bound Based on Path-Metrics for Modern ReLU Network Parameterizations
TL;DR: This paper bounds function distances between ReLU networks via a metric that is invariant to rescaling-symmetries.
Abstract: Robustness with respect to weight perturbations underpins guarantees for generalization, pruning and quantization. Existing
guarantees rely on *Lipschitz bounds in parameter space*, cover only plain feed-forward MLPs, and break under the ubiquitous neuron-wise rescaling symmetry of ReLU networks. We prove a new Lipschitz inequality expressed through the $\ell^{1}$-*path-metric* of the weights. The bound is (i) *rescaling-invariant* by construction and (ii) applies to any ReLU-DAG architecture with any combination of
convolutions, skip connections, pooling, and frozen (inference-time) batch-normalization —thus encompassing ResNets, U-Nets, VGG-style CNNs, and more. By respecting the network’s natural symmetries, the new bound strictly sharpens prior parameter-space bounds and can be computed in two forward passes. To illustrate its utility, we derive from it a symmetry-aware pruning criterion and
show—through a proof-of-concept experiment on a ResNet-18 trained on ImageNet—that its pruning performance matches that of classical magnitude pruning, while becoming totally immune to arbitrary neuron-wise rescalings.
Lay Summary: Imagine adjusting the settings on a complex sound system with countless knobs and switches. In the world of artificial intelligence, these "knobs" are the internal settings of neural networks. People often tweak them to make AI models more efficient. (e.g., faster and/or cheaper to use). However, even small adjustments can sometimes lead to unexpected changes in how the AI behaves. Traditional methods to check this stability often don’t apply to modern, complex AI models, and can even predict highly pessimistic instability—even for changes that are known to be harmless.
Our research introduces a new method that extends stability checks to modern AI architectures and can significantly reduce these misleading warnings. In particular, it ensures that harmless knob changes can’t trigger overly negative predictions about the model’s behavior. We also show that this method can help simplify AI models by safely removing unneeded parts, without sacrificing performance. This opens the door to more reliable and efficient AI systems in real-world use.
Primary Area: Theory->Deep Learning
Keywords: relu neural networks, Lipschitz, rescaling-symmetry, path-lifting
Submission Number: 798
Loading