L1 Acquisition in Telicity: Connecting Linguistic Cues and LLM-Based Surprisal
Keywords: L1 acquisition, telicity, LLM surprisal, syntax, semantics, learning
Abstract: Acquiring telicity, the distinction between bounded and un-bounded events (Krifka, 1998), requires L1 learners to map surface-level and semantic cues toabstract event structures. While theoretical frameworks like the Transparency Principle (Lightfoot, 2017; Wagner, 2006) suggest children acquire these mappings in non-uniform stages, the exact computational trajectory remains under-explored. Prior computational modeling has tested transformer sensitivity (Zhao et al., 2021) and distributional semantics (Kim et al., 2024), yet these often lack the structured syntactic features required to track complex compositional telicity. This study bridges formal linguistic theories with naturalistic empirical data to map the developmental trajectory of these cues.
We analyze English child and adult child-directed speech extracted from the CHILDES database (MacWhinney, 2000). We develop a computational pipeline utilizing LLM-based token surprisal (GPT-2) to diagnose telicity through the relative probability of temporal adverbials (e.g., in an hour vs. for an hour). After validating this metric against three expert linguist judgments, achieving an 88.8% lenient accuracy, we train interpretable diagnostic classifiers (Logistic Regression) to predict telicity labels. The models are trained using 12 features spanning syntactic configurations (e.g., post-verbal determiners), morphosyntactic markers, and lexical semantics (Verb Class).
Our analysis identifies a stark developmental shift from syntax to semantics. The child model demonstrates near-perfect classification performance driven almost exclusively by local syntactic configurations, relying on post-verbal determiners as a high-precision heuristic. By contrast, this “Determiner Clue” is entirely neutralized in the adult model, which instead heavily weights the abstract, inherent aspectual semantics of the verb.
These findings offer robust quantitative support for the Syntactic Bootstrapping Hypothesis (Gleitman, 1990; Wagner, 2006). The data suggests that children initially bootstrap their understanding of event boundedness through surface-level syntactic before transferring to the deeper, compositional semantic integrations utilized by adults.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 85
Loading