A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications

ACL ARR 2025 February Submission1651 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Misgendering is the act of referring to someone by a gender that does not match their chosen identity. It marginalizes and undermines a person's sense of self, causing significant harm. English-based approaches have clear-cut approaches to avoiding misgendering, such as the use of the pronoun ``they''. However, other languages pose unique challenges due to both grammatical and cultural constructs. In this work we develop methodologies to assess and mitigate misgendering across 42 languages and dialects using a participatory-design approach to design effective and appropriate guardrails across all languages. We test these guardrails in a standard large language model-based application (meeting transcript summarization), where both the data generation and the annotation steps followed a human-in-the-loop approach. We find that the proposed guardrails are very effective in reducing misgendering rates across all languages in the summaries generated, and without incurring loss of quality. Our human-in-the-loop approach demonstrates a method to feasibly scale inclusive and responsible AI-based solutions across multiple languages and cultures.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: multilingual, culture, responsible AI, evaluation, participatory design
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data resources
Languages Studied: "Czech", "German (from Germany)", "Spanish", "French", "Italian", "Japanese", "Portuguese (Brazil)", "Chinese (standard, simplified)", "Arabic", "Danish", "Finnish", "Hebrew", "Hungarian", "Korean", "Norwegian (Bokmaal)", "Dutch", "Polish", "Portuguese (European)", "Russian", "Swedish", "Thai", "Turkish", "Ukrainian", "Chinese (standard, traditional)", "Bulgarian", "Catalan", "Welsh", "Greek", "English (UK)", "Spanish (Mexico)", "Estonian", "French (Canada)", "Croatian", "Indonesian", "Lithuanian", "Latvian", "Romanian", "Slovak", "Slovenian", "Serbian", "Vietnamese", "German (Swiss standard)", "Flemish"
Submission Number: 1651
Loading