No Question, No Passage, No Problem: Investigating Artifact Exploitation and Reasoning in Multiple-Choice Reading Comprehension

Anthony Cui; Rohan Raj Butani; Theodore Oltean

No Question, No Passage, No Problem: Investigating Artifact Exploitation and Reasoning in Multiple-Choice Reading Comprehension

Anthony Cui, Rohan Raj Butani, Theodore Oltean

Published: 24 Sept 2025, Last Modified: 24 Sept 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Dataset Artifacts, Multiple-Choice Reading Comprehension, Machine Reading Comprehension, Multiple-Choice QA, Partial-Input Prompting

TL;DR: We show that large language models consistently surpass majority baselines in multiple-choice reading comprehension even without passages or questions, revealing reasoning strategies beyond simple artifact exploitation.

Abstract: Large language models (LLMs) can achieve above majority baseline performance on NLP tasks even when deprived of parts of the input, raising concerns that benchmarks reward artifacts rather than reasoning. Prior work has demonstrated this phenomenon in multiple-choice QA and natural language inference, but not in multiple-choice reading comprehension (MCRC), where both a passage and question are integral to the task. We study MCRC under a stricter ablation, removing both passage and question to leave only the answer options, and evaluate closed-source LLMs in a zero-shot setting. Despite this severe ablation, models consistently exceed majority baselines across five benchmarks. To probe how such accuracy arises, we introduce two reasoning-based strategies: process-of-elimination, which iteratively discards distractors, and abductive passage inference, which infers a context to justify an option. Both strategies closely track choices-only accuracy, suggesting that strong performance reflects genuine reasoning procedures rather than artifacts alone. These findings motivate the study of broader reasoning strategies under ablation as a tool for disentangling shallow cues from structured inference in modern LLMs.

Submission Number: 169

Loading