Towards Interpretable Machine Reading Comprehension with Mixed Effects Regression and Exploratory Prompt AnalysisDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: Mixed effects models offer advantages in evaluating language models' reading comprehension ability.
Abstract: We investigate the properties of natural language prompts that determine their difficulty in machine reading comprehension (MRC) tasks. While much work has been done benchmarking language model (LM) performance at the task level, there is considerably less literature focused on how individual task items can enhance interpretability for MRC. We perform a mixed effects analysis on the behavior of three major LMs, comparing their performance on a large multiple choice MRC task to explain the relationship between predicted accuracy and different prompt features. First, we observe a divergence in LM accuracy as the prompt's token count grows with overall stronger LMs increasing in accuracy and overall weaker LMs decreasing. Second, all LMs exhibit consistent accuracy gains with increasing syntactic complexity. Third, a post hoc analysis revealed that the most difficult prompts had the greatest ability to discriminate between different LMs, suggesting their outsized usefulness in MRC evaluation methods.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
0 Replies

Loading