Enhancing Robustness of Deep Learning via Unified Latent Representation

25 Sept 2024 (modified: 25 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: deep learning robustness, out-of-distribution inputs, adversarial examples, VAE latent representation
TL;DR: We propose using VAEs to enhance DNN robustness against both adversarial examples and OoD inputs by leveraging similarities in their latent space representations, allowing for their detection without retraining classifiers.
Abstract:

Adversarial examples and Out-of-Distribution (OoD) inputs constitute major problematic instances for the image classifiers based on Deep Neural Networks (DNNs). In particular, DNNs tend to be overconfident with their predictions, assigning a different category with a high probability. In this work, we suggest a combined solution to tackle both input types based on the Variational Autoencoder (VAE). First, we scrutinize the recent successful results in detecting OoDs utilizing Bayesian epistemic uncertainty estimation over weights of VAEs. Surprisingly, contrary to the previous claims in the literature, we discover that we can obtain comparable detection performance utilizing a standard procedure of importance sampling with the classical formulation of VAE. Second, we dissect the marginal likelihood approximation, analyzing the primary source of variation responsible for distinguishing inliers versus outliers, and establish a link with the recent promising results in detecting outliers using latent holes. Finally, we identify that adversarial examples and OoD inputs have similar latent representations. This insight allows us to develop separate methods to automatically distinguish between them by considering their non-similarities in the input space. The suggested approach enables pre-training a VAE model on specific input data, allowing it to act as a gatekeeper. This achieves two major goals: defending the DNN classifier against potential attacks and flagging OoDs. Once pre-trained, VAE can be plugged as a filter into any DNN image classifier of arbitrary architecture trained on the same data inputs without the need for its retraining or accessing the layers and weights of the DNN.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4537
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview