Building Neural Networks that are Robust to Adversarial Examples Using Probabilistic Loss Function

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial Training, Deep Neural Networks, FGSM, Defensive Distillation
TL;DR: Building Neural Networks that are Robust to Adversarial Examples Using Probabilistic Loss Function for classification and regression
Abstract: Adversarial examples are an Achilles heel of deep neural networks, robbing them of their functional performance in mission-critical applications. This work proposes a novel method of making deep neural networks robust to adversarial training by modifying the loss function to a soft version of cross-entropy loss for classification problems. For regression problems, the data distribution is examined using Bayesian techniques (Gaussian Mixture Model) and the loss function is modified using a posterior probability distribution. The approach is justified using mathematical derivation and is supplemented by applying it on MNIST and Imagenet classification problems to demonstrate its robustness to FGSM and Carlinii-Wagner L2 attacks. This approach alleviates the overhead of training an additional model inherent in adversarial-distillation based methods.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 23623
Loading