Rare but Severe Errors Induced by Minimal Deletions in English-Chinese Neural Machine TranslationDownload PDF

Anonymous

16 Oct 2021 (modified: 05 May 2023)ACL ARR 2021 October Blind SubmissionReaders: Everyone
Abstract: We examine the inducement of rare but severe errors in English-Chinese and Chinese-English Transformer-based neural machine translation by minimal deletion in the source text. We also examine the effect of training data size on the number and types of pathological cases induced by these perturbations, finding significant variation. We find that one type of hallucination can be remedied through data preprocessing and that deleting words hurts more than deleting characters in a character-based model, even though deleting characters introduces nonsense words.
0 Replies

Loading