Lessons Learned About Transparency, Fairness, and Explainability from Two Automated Scoring Challenges
Track: Responsible AI for Education (Day 2)
Paper Length: short-paper (2 pages + references)
Keywords: automated scoring, math constructed response, responsible ai, fairness analyses
TL;DR: We describe results from two data challenges in responses to transparency, fairness and interpretability.
Abstract: This paper describes the results of two automated scoring challenges that were conducted as research studies to evaluate the feasibility of using automated scoring for fourth and eighth grade reading and math short constructed responses. These challenges demonstrated that these responses could be scored almost as accurately as human raters, with the math items being even more accurately scored than reading items. Challenge review criteria included a required technical report that made the approach used explainable, interpretable, and transparent. In addition, both challenges required the participants to demonstrate that their innovation was fair and did not contribute additional bias in scoring in order for submissions to be considered valid entries. For both the reading and math challenge, no bias was discovered for major demo-graphic groups of race/ethnicity or gender. For both challenges, the participants described their feature engineering process as well as their process of designing and testing their model of interest; however, they did not provide interpretable models due to the use of Large Language Models that have thousands or millions or parameters to represent the student text. This paper describes the fairness and transparency/interpretability results as well as some suggested future directions for the field.
Submission Number: 29
Loading