Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks

Published: 02 Mar 2026, Last Modified: 06 Mar 2026ICLR 2026 Trustworthy AIEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial robustness, Adversarial attack, Backdoor attack, Low-Rank Approximation, Pruning
TL;DR: We derive exact formulae on the smallest weight perturbations needed to change a network’s output, relate them to Lipschitz robustness guarantees, and show that low-rank compression can reliably activate backdoor attacks once the bounds are exceeded.
Abstract: The minimal norm weight perturbations of DNNs required to achieve a specified change in output are derived and the factors determining its size are discussed. These single-layer exact formulae are contrasted with more generic multi-layer Lipschitz constant based robustness guarantees; both are observed to be of the same order which indicates similar efficacy in their guarantees. These results are applied to precision-modification-activated backdoor attacks, establishing provable compression thresholds below which such attacks cannot succeed, and show empirically that low-rank compression can reliably activate latent backdoors while preserving full-precision accuracy. These expressions reveal how back-propagated margins govern layer-wise sensitivity and provide certifiable guarantees on the smallest parameter updates consistent with a desired output shift.
Submission Number: 17
Loading