TRIESTE: translation based defense for text classifiers

Published: 01 Jan 2023, Last Modified: 13 Nov 2024J. Ambient Intell. Humaniz. Comput. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The field of natural language processing (NLP) has significantly evolved with the advent of state-of-the-art models. The discovery of these models has entirely revolutionised how NLP tasks such as machine translation, sentiment analysis and many others are performed. However, despite their high efficacy and meticulous performance, these models are prone to adversarial attacks. Adversarial attacks involve the introduction of perturbations imperceptible to humans, which can severely impact the model’s learning and prediction accuracy. Current defenses on text data include approaches such as spell-checking and adversarial training, which have their limitations against state-of-the-art adversarial attacks. This paper put forward an effective transformation-based defense, TRIESTE (TRanslatIon basEd defenSe for Text classifiErs). The proposed defense overcomes the shortcomings of existing defenses by translating the input text from the source language to a target language and again back to the source language before providing it to the text classifier. Translation ensures that the sentiment of the translated text is similar to that of the input text by taking the entire text into consideration, which leads to the removal of adversarial perturbations. Rigorous evaluation on publicly available datasets showcases that TRIESTE is successful against state-of-the-art attacks without a significant drop in the classifier accuracy.
Loading