Detection of adversarial attacks

Alex Jin, Cyril ZAIMI

22 Mar 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: The growing popularity and use of NLP technologies has led to an increased interest in adversarial attacks, which can significantly impact the performance and reliability of machine learning models. It is crucial to develop methods that can protect these systems from such attacks and detect them in real-time to mitigate their effects. In this study, we explore different approaches to increase the robustness of NLP models against adversarial attacks by comparing a simple baseline that involves fine-tuning a RoBERTa model to other methods that utilize the model’s embeddings.Our findings can potentially contribute to the development of more effective defense mechanisms against adversarial attacks on NLP models.

0 Replies