NMT-Obfuscator Attack: Ignore a sentence in translation with only one word

Sahar Sadrizadeh; César Descalzo; Ljiljana Dolamic; Pascal Frossard

NMT-Obfuscator Attack: Ignore a sentence in translation with only one word

Sahar Sadrizadeh, César Descalzo, Ljiljana Dolamic, Pascal Frossard

Published: 12 Oct 2024, Last Modified: 14 Nov 2024SafeGenAi PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adversarial attack, neural machine translation, obfuscator, safety

TL;DR: We propose NMT-Obfuscator to build adversarial attacks against neural machine translation systems, which can force the NMT model not to translate the target sentence that is appended to the original sentence with an obfuscator word.

Abstract: Neural Machine Translation systems are used in diverse applications due to their impressive performance. However, recent studies have shown that these systems are vulnerable to carefully crafted small perturbations to their inputs, known as adversarial attacks. In this paper, we propose a new type of adversarial attack against NMT models. In this attack, we find a word to be added between two sentences such that the second sentence is ignored and not translated by the NMT model. The word added between the two sentences is such that the whole adversarial text is natural in the source language. This type of attack can be harmful in practical scenarios since the attacker can hide malicious information in the automatic translation made by the target NMT model. Our experiments show that different NMT models and translation tasks are vulnerable to this type of attack. Our attack can successfully force the NMT models to ignore the second part of the input in the translation for more than 50\% of all cases while being able to maintain low perplexity for the whole input.

Submission Number: 187

Loading