Modelling Child Learning and Parsing of Long-range Syntactic Dependencies

Louis Mahon, Mark Johnson, Mark Steedman

Published: 2025, Last Modified: 26 Jul 2025CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This work develops a probabilistic child language acquisition model to learn a range of linguistic phenonmena, most notably long-range syntactic dependencies of the sort found in object wh-questions, among other constructions. The model is trained on a corpus of real child-directed speech, where each utterance is paired with a logical form as a meaning representation. It then learns both word meanings and language-specific syntax simultaneously. After training, the model can deduce the correct parse tree and word meanings for a given utterance-meaning pair, and can infer the meaning if given only the utterance. The successful modelling of long-range dependencies is theoretically important because it exploits aspects of the model that are, in general, trans-context-free.