AdversNLP: A Practical Guide to Assessing NLP Robustness Against Text Adversarial Attacks

Published: 20 Jun 2023, Last Modified: 07 Aug 2023AdvML-Frontiers 2023EveryoneRevisionsBibTeX
Keywords: Adversarial attacks, NLP, Text adversarial attacks, NLP applications audit, Openattack, Textattack, Atlas Mitre, shielding techniques, robustness KPIs
TL;DR: AdversNLP, a practical guide assessing the robustness of NLP applications against text-based adversaries.
Abstract: The emergence of powerful language models in natural language processing (NLP) has sparked a wave of excitement for their potential to revolutionize decision-making. However, this excitement should be tempered by their vulnerability to adversarial attacks, which are carefully perturbed inputs able to fool the model into inaccurate decisions. In this paper, we present AdversNLP, a practical framework to assess the robustness of NLP applications against text-based adversaries. Our framework combines and extends upon the technical capabilities of established NLP adversarial attacking tools (i.e. TextAttack) and tailors an audit guide to navigate the landscape of threats to NLP applications. AdversNLP illustrates best practices, and vulnerabilities through customized attacking recipes, and presenting evaluation metrics in the form of key performance indicators (KPIs). Our study demonstrates the severity of the threat posed by adversarial attacks and the need for more initiatives bridging the gap between research contributions and industrial applications.
Submission Number: 54
Loading