Abstract: Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring leverage textual representations from pre-trained language models like BERT. Existing approaches train a separate model for each item/question, suitable for scenarios like essay scoring where items can be different from one another. However, these approaches have two limitations: 1) they fail to leverage item linkage for scenarios such as reading comprehension where multiple items may share a reading passage; 2) they are not scalable since storing one model per item is difficult with large language models. We report our (grand prize-winning) solution to the National Assessment of Education Progress (NAEP) automated scoring challenge for reading comprehension. Our approach, in-context BERT fine-tuning, produces a single shared scoring model for all items with a carefully designed input structure to provide contextual information on each item. Our experiments demonstrate the effectiveness of our approach which outperforms existing methods. We also perform a qualitative analysis and discuss the limitations of our approach. (Full version of the paper can be found at: https://arxiv.org/abs/2205.09864 Our implementation can be found at: https://github.com/ni9elf/automated-scoring )
0 Replies
Loading