Feature Extraction Based on Denoising Auto Encoder for Classification of Adversarial Examples

Yuma Yamasaki, Minoru Kuribayashi, Nobuo Funabiki, Huy H. Nguyen, Isao Echizen

2021 (modified: 17 Apr 2023)APSIPA ASC 2021Readers: Everyone

Abstract: Adversarial examples have been recognized as one of the threats to machine learning techniques. Tiny perturbations are added to multimedia content to cause a misclassification in a target CNN - based model. In conventional studies, such perturbations are removed using a couple of filters, and for classification, the features are extracted from the observations of the output of the CNN-based model. However, the use of well-known filters may enable an attacker to adjust an adversarial attack to deal with such filters and fool the detector. In this study, we investigated the effectiveness of certain auto encoders (AEs) in extracting the traces of perturbations. Even if the structure of the AE is leaked, the difference in the training datasets makes an adjustment of the attack difficult to achieve. The effectiveness of the AE designed in this study was evaluated experimentally, and its combination with some known filters was also evaluated.

0 Replies