Evaluation of Adversarial Examples Based on Original Definition

Ko Fujimori, Toshiki Shibahara, Daiki Chiba, Mitsuaki Akiyama, Masato Uchida

Published: 01 Jan 2024, Last Modified: 13 May 2025HCI (62) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Adversarial Examples (AEs) can induce misclassification in neural networks by adding small noise to input data. The success of an adversarial attack is implicitly defined as inducing misclassification without making the added noise discernible to humans. However, previous studies have mainly focused on misclassifying machine learning models, ignoring noise discernibility. To address this gap, we evaluated AEs based on the original definition of attack success. Using large-scale crowdsourcing surveys, we investigated the proportion of successful AEs created under the same conditions as in a previous study. Our findings demonstrate that the performance of AEs in inducing misclassification significantly decreases when they are evaluated based on the original definition of attack success.