Keywords: inductive bias, language models, linguistic typology
TL;DR: We introduce GCG-based artificial languages that cover previously overlooked constructions, such as VSO/OSV word orders and object relative clauses. We re-evaluate LM's inductive biases with our GCG-based data.
Abstract: Recent work has investigated whether extant neural language models (LMs) have an inbuilt inductive bias towards the acquisition of attested typologically-frequent grammatical patterns as opposed to infrequent, unattested, or impossible patterns using artificial languages (White and Cotterell, 2021; Kuribayashi et al., 2024). The use of artificial languages facilitates isolation of specific grammatical properties from other factors such as lexical or real-world knowledge, but also risks oversimplification of the problem.
In this paper, we examine the use of Generalized Categorial Grammars (GCGs) (Wood, 2014) as a general framework to create artificial languages with a wider range of attested word order patterns, including those where the subject intervenes between verb and object (VSO, OSV) and unbounded dependencies in object relative clauses.
In our experiments, we exemplify our approach by extending White and Cotterell (2021) and report some significant differences from existing results.
Supplementary Material: pdf
Submission Number: 205
Loading