Disentangling Form and World Knowledge in LLM Interpretation: Evidence from Quantifier Scope Disambiguation

Disentangling Form and World Knowledge in LLM Interpretation: Evidence from Quantifier Scope Disambiguation

ACL ARR 2026 January Submission2014 Authors

01 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: quantifier scope disambiguation (QSD), large language models (LLMs), world knowledge, semantic interpretation, retrieval-augmented generation (RAG)

Abstract: We investigate how large language models (LLMs) construct meaning using Quantifier Scope Disambiguation (QSD) as a controlled probing task. We introduce (i) a balanced English corpus designed to neutralize classical surface heuristics in QSD, and (ii) a pseudosentence dataset that removes real-world referents to isolate formal cues. In Experiment 1, we evaluate a range of LLMs in a zero-shot question-answering setup and compare them to human baselines. While models achieve high accuracy on the balanced corpus, performance drops substantially on pseudosentences, with the largest degradation for inverse-scope readings. This pattern indicates that surface-level cues alone are insufficient to explain model behavior and suggests a substantive contribution of implicit world knowledge to LLM interpretation. In Experiment 2, we manipulate access to external world knowledge via retrieval-augmented generation (RAG), while keeping the task and prompt fixed. RAG yields only limited gains in overall accuracy, but these effects are highly selective: they primarily affect inverse-scope interpretations, and most clearly the hardest configuration, where classical surface predictors conflict with the preferred reading. Taken together, our results suggest that LLM interpretive behavior reflects an interaction between world knowledge and formal interpretive pressures encoded in the input, with world knowledge—implicit or retrieved—playing a particularly important role when formal cues are insufficient to yield a preferred reading. This pattern partially parallels, but does not fully match, human scope interpretation.

Paper Type: Long

Research Area: Semantics: Lexical, Sentence-level Semantics, Textual Inference and Other areas

Research Area Keywords: compositionality, probing, robustness

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 2014

Loading