Building Neural Networks that are Robust to Adversarial Examples Using Probabilistic Loss Function

Samit Ahlawat

Building Neural Networks that are Robust to Adversarial Examples Using Probabilistic Loss Function

Samit Ahlawat

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adversarial Training, Deep Neural Networks, FGSM, Defensive Distillation

TL;DR: Building Neural Networks that are Robust to Adversarial Examples Using Probabilistic Loss Function for classification and regression

Abstract: Adversarial examples are an Achilles heel of deep neural networks, robbing them of their functional performance in mission-critical applications. This work proposes a novel method of making deep neural networks robust to adversarial training by modifying the loss function to a soft version of cross-entropy loss for classification problems. For regression problems, the data distribution is examined using Bayesian techniques (Gaussian Mixture Model) and the loss function is modified using a posterior probability distribution. The approach is justified using mathematical derivation and is supplemented by applying it on MNIST and Imagenet classification problems to demonstrate its robustness to FGSM and Carlinii-Wagner L2 attacks. This approach alleviates the overhead of training an additional model inherent in adversarial-distillation based methods.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 23623

Loading