Added Toxicity Mitigation at Inference Time for Multimodal and Massively Multilingual Translation

Marta R. Costa-jussà, David Dale, Maha Elbayad, Bokai Yu

Published: 2024, Last Modified: 05 Dec 2025EAMT (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Machine translation models sometimes lead to added toxicity: translated outputs may contain more toxic content that the original input. In this paper, we introduce MinTox, a novel pipeline to automatically identify and mitigate added toxicity at inference time, without further model training. MinTox leverages a multimodal (speech and text) toxicity classifier that can scale across languages.We demonstrate the capabilities of MinTox when applied to SEAMLESSM4T, a multi-modal and massively multilingual machine translation system. MinTox significantly reduces added toxicity: across all domains, modalities and language directions, 25% to95% of added toxicity is successfully filtered out, while preserving translation quality