Towards an Educator-Centered Method for Measuring Bias in Large Language Model-Based Chatbot Tutors

Published: 14 Dec 2023, Last Modified: 04 Jun 2024AI4ED-AAAI-2024 day2spotlightEveryoneRevisionsBibTeXCC BY 4.0
Track: Responsible AI for Education (Day 2)
Paper Length: short-paper (2 pages + references)
Keywords: algorithm audit, large language models, LLM, education, edtech
TL;DR: We propose a method for measuring to what extent a chatbot tutor’s answer quality varies based on the stated or perceived identity of the student prompting the bot.
Abstract: Large language model (LLM)-based education technology (edtech) is increasingly being developed and deployed with the goal of improving both learning outcomes and teaching processes. Given the newness of this technology, potential downstream impacts of its use in education are largely unknown. While we believe that educators should be centered in the development of edtech, they often have limited recourse to interrogate tools they are being asked to introduce to their classrooms. To address this, we are developing a method to measure potential harms from chatbot tutors that is accessible to and replicable by educators. Specifically, we propose a method for measuring to what extent a chatbot tutor's answer quality varies based on the stated or perceived identity of the student prompting the bot. An explicit benefit of chatbot tutors is that they are able to personalize their responses to an individual student. However, if personalization results in responses that vary in their correctness or relevance across students, this represents a potential bias that could lead to disparities in learning outcomes. We aim to provide a simple, automated, and inexpensive method to quantify and potentially mitigate these biases. Our goal is to develop an approach that centers educators, providing them with more understanding of and autonomy over LLM-based edtech.
Cover Letter: pdf
Submission Number: 19
Loading