Building NLP Evaluation Resources with LLMs and Community Engagement for Scale and Depth

01 Aug 2023 (modified: 07 Dec 2023)DeepLearningIndaba 2023 Conference SubmissionEveryoneRevisionsBibTeX
Keywords: stereotypes, NLP, evaluation
TL;DR: Complementary approaches towards stereotype benchmark building
Abstract: Measurements of fairness in NLP have been critiqued for lacking concrete definitions of biases or harms measured, and for perpetuating a singular, Western narrative of fairness globally. Current approaches to combat this issue through curation of resources face the significant challenge of achieving coverage over global cultures and perspectives at scale. In this paper, we demonstrate the utility and importance of complementary approaches that leverage both large generative models as well as community engagement in these curation strategies. We specifically target the harm of stereotyping and demonstrate a pathway to build a benchmark that covers stereotypes about diverse, and intersectional identities. We discuss the two approaches, their advantages and constraints, and the characteristics of the data they produce. We further discuss their potential to be used complementarily for better evaluation of stereotyping harms, in particular, for the African context.
Submission Category: Machine learning algorithms
Submission Number: 82
Loading