Processing Gaps: Explaining Human-Model Misalignment in Filler-Gap Dependency Parsing

Liliana Nentcheva; Sebastian Schuster; Donald Dunagan; Andrea Santi

Processing Gaps: Explaining Human-Model Misalignment in Filler-Gap Dependency Parsing

Liliana Nentcheva, Sebastian Schuster, Donald Dunagan, Andrea Santi

Published: 03 Oct 2025, Last Modified: 13 Nov 2025CPL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Maze Task, Non-Local Dependencies, Relative Clause, Surprisal

TL;DR: LMs do not predict human subject gap preferences in non-local dependencies, potentially because of an impoverished syntactic representation of the input.

Abstract: Recent theories of human parsing emphasise the contribution of both contextual (un)predictability and memory demands to a word’s processing difficulty [1,2]. Temporarily ambiguous sentences which terminate in the less frequent continuation generate reading time (RT) slowdowns of much greater magnitude than what is predicted by language-models' (LM) estimates of next-word probability [1]. One explanation is that limited memory encourages human parsers to commit to a single structural interpretation, resulting in costly reanalysis when disconfirmed by later input [1]. Although LMs can represent multiple readings of ambiguous sentences [3], even LMs with limited parallelism fail to account for human difficulty in these ‘garden-path’ structures [4]. This raises the possibility that corpus data is not accurately reflected in the model’s learned probabilities of competing continuations [4], perhaps due to an impoverished syntactic representation. We ask if human parsing preferences are predicted by LM-probability estimates in structures with high memory demands that do not require revision. \ Human data is provided from a novel implementation of the Maze Task [5], where participants were required to choose between a relative clause (RC) continuation which unambiguously signals a subject (e.g. should) or object (the) gap. The selection was made either at a non-local choice-point, within a complement clause of the RC verb (Exp.1, n=76) or locally, following the relativiser (Exp.2, n=62) (Table 1). Humans demonstrated a strong subject gap preference non-locally (0.91) and, to a lesser extent, also in the local condition (0.66) [5]. This aligns with frequency counts in the Penn Tree Bank (PTB): using a Tregex search for constituent structure (without specifying lexical items), we found a heightened subject gap probability non-locally (1.00) compared to locally (0.83) (Table 2). In the local condition, all other trees included object gaps, such that other available parses are too infrequent to constitute serious competitors. Surprisal [6] estimates were subsequently generated for the words signaling the subject vs. object gap continuations (should and the) using GPT-2 (small). To facilitate linking without transformation assumptions (as previously relied on, e.g. [1]), we converted the Surprisal values into the relative probability of the subject gap parse (pSgap) (Equation 1), to be on the same scale as our human data. pSgap predicted the subject-gap preference locally (p<.001), but did not predict the strength of the preference non-locally (p=.426), demonstrating a similar probability of the subject gap continuation across dependency lengths (Table 3). \ It is expected that the human preference and corpus data align in showing a higher subject gap probability non-locally, given that human parsers are motivated to integrate the filler early across both comprehension and production to reduce memory demands [7]. Failure of the LM to predict the non-local data could result from human-like memory constraints not being faithfully reflected in the model architecture, or the absence of an explicit syntactic representation of the input. While prior attempts to combine syntactic and lexical surprisal have shown limited success in approximating human data for complex sentences [8], including movement dependencies with multiple embeddings [9], these models were trained on relatively ‘superficial’ representations of syntax. Movement dependencies rely on hierarchical relations and may require an LM which explicitly represents gaps. A promising direction for future work is to model these effects using an LM provided with more sophisticated syntactic supervision to disentangle the contribution of structural representations from memory constraints.

Submission Number: 19

Loading