Abstract: Dyslexia can affect writing, leading to unique patterns such as letter and homophone swapping. As a result, text produced by people with dyslexia often differs from the text typically used to train natural language process (NLP) models, raising concerns about their effectiveness for dyslexic users. This paper examines the fairness of four commercial machine translation (MT) systems toward dyslexic text through a systematic audit using both synthetically generated dyslexic text and real writing from individuals with dyslexia. By programmatically introducing various dyslexic-style errors into the WMT dataset, we present insights on how dyslexia biases manifest in MT systems as the text becomes more dyslexic, especially with real-word errors. Our results shed light on the NLP biases affecting people with dyslexia -- a population often rely on NLP tools as assistive technologies, highlighting the needs for more diverse data and user representation in the development of foundational NLP models.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/fairness evaluation, data augmentation, bias in machine translation
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources
Languages Studied: English, French, dyslexic English
Submission Number: 3150
Loading