Keywords: robustness, real-world noise, multilingual LLMs
TL;DR: LLMs are vulnerable to real-world noisy data
Abstract: When interacting with Large Language Models (LLMs), one common way is to use textual content via typing during which many users might make spelling mistakes. We investigate the effect of such mistakes on the performance of 9 language models, with parameters ranging from 0.2B to 13B, in 3 different NLP tasks, namely Natural Language Inference (NLI), Name Entity Recognition (NER), and Intent Classification (IC). We perform our experiments on 6 different languages (English, German, French, Spanish, Hindi, and Turkish) and build a dictionary of real-world noise for them using Wikipedia edit history. We show that the performance gap of the studied models on the clean and noisy test data averaged across all the datasets and languages ranges from 2.3 to 4.3 absolute percentage points. In addition, mT5 models, in general, show more robustness compared to BLOOM, Falcon, and BERT-like models. In particular, mT5 (13B), was the most robust on average overall, across the 3 tasks, and in 4 out of the 6 languages.
Submission Number: 70
Loading