Can LLMs Solve Reading Comprehension Tests as Second Language Learners?

Published: 29 Jun 2024, Last Modified: 07 Jul 2024KiL 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Natural Language Processing, Large Language Models, Question Answering, Reading Comprehension
Abstract: The manual evaluation of natural language processing systems is costly and time-consuming, especially when targeting people with specific attributes as evaluators. Current large language models (LLMs) are reported to outperform humans at various tasks, and recently have been used as substitutes for human evaluators. LLMs also have shown the ability to behave as specified in a prompt. This progress raises a fundamental question: can LLMs mimic the behavior of language learners? In this study, we intentionally weaken LLMs aiming to make them simulate language learners on multiple-choice reading comprehension tests. By comparing answer distributions from language learners and LLMs, we observe that prompts designed to weaken the LLMs indeed degrade their performance. However, this degration does not bridge the gap between the original LLMs and language learners, thereby hilighting a critical discrepancy between them.
Submission Number: 7
Loading