Layer-wise regularized adversarial training using layers sustainability analysis framework

Mohammad Khalooei, Mohammad Mehdi Homayounpour, Maryam Amirmazlaghani

Published: 27 Mar 2023, Last Modified: 13 Nov 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Deep neural network models are used today in various applications of artificial intelligence, the strengthening of which, in the face of adversarial attacks is of particular importance. An appropriate solution to adversarial attacks is adversarial training, which reaches a trade-off between robustness and generalization. This paper introduces a novel framework (Layer Sustainability Analysis (LSA)) for the analysis of layer vulnerability in an arbitrary neural network in the scenario of adversarial attacks. LSA can be a helpful toolkit to assess deep neural networks and to extend the adversarial training approaches towards improving the sustainability of model layers via layer monitoring and analysis. The LSA framework identifies a list of Most Vulnerable Layers (MVL list) of the given network. The relative error, as a comparison measure, is used to evaluate representation sustainability of each layer against adversarial inputs. The proposed approach for obtaining robust neural networks to fend off adversarial attacks is based on a layer-wise regularization (LR) over LSA proposal(s) for adversarial training (AT). This means that the AT-LR procedure could be used with any benchmark adversarial attack to reduce the vulnerability of network layers and to improve conventional adversarial training approaches. The proposed idea performs well theoretically and experimentally for state-of-the-art multilayer perceptron and convolutional neural network architectures. Additionally, a measure named robustness and generalization score or R&G score is defined to better evaluate each adversarially trained model over a variety of significant perturbations. Compared with the AT-LR and its corresponding base adversarial training, the R&G score on Moon, MNIST, and CIFAR-10 benchmark datasets was increased by 56.52%, 75.82%, and 6.54%, respectively for more significant perturbations. The LSA framework is available and published at https://github.com/khalooei/LSA.