A baseline for detecting Textual Attacks in Sentiment Analysis Classification using Density EstimationDownload PDF

21 Mar 2023OpenReview Archive Direct UploadReaders: Everyone
Abstract: Building NLP models that are resistant to computer destabilization has become a key element of research in recent years. While models are becoming more and more reliable and robust, concerns about the exploitation of their flaws involve the construction of tools to guarantee their robustness and to protect against computer attacks. As a result, adversarial defense have been aggressively developed over the past decade, showing convincing results in improving the robustness of models and their resistance to attacks. However, another crucial tool in protecting from attacks is to improve word-adversarial attacks detection. In this paper, we evaluate the performance of two attack detection methods on two prepared datasets and two transformer-based models. Our main goal is to investigate and confirm the results obtained in Yoo et al., using density estimation.
0 Replies

Loading