Lost in Translation: Benchmarking Commercial Machine Translation Models for Dyslexic-Style Text

Lost in Translation: Benchmarking Commercial Machine Translation Models for Dyslexic-Style Text

ACL ARR 2025 February Submission3150 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Dyslexia can affect writing, leading to unique patterns such as letter and homophone swapping. As a result, text produced by people with dyslexia often differs from the text typically used to train natural language process (NLP) models, raising concerns about their effectiveness for dyslexic users. This paper examines the fairness of four commercial machine translation (MT) systems toward dyslexic text through a systematic audit using both synthetically generated dyslexic text and real writing from individuals with dyslexia. By programmatically introducing various dyslexic-style errors into the WMT dataset, we present insights on how dyslexia biases manifest in MT systems as the text becomes more dyslexic, especially with real-word errors. Our results shed light on the NLP biases affecting people with dyslexia -- a population often rely on NLP tools as assistive technologies, highlighting the needs for more diverse data and user representation in the development of foundational NLP models.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: model bias/fairness evaluation, data augmentation, bias in machine translation

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources

Languages Studied: English, French, dyslexic English

Submission Number: 3150

Loading