GCG-Based Artificial Languages for Evaluating Inductive Biases of Neural Language Models

Nadine El-Naggar; Tatsuki Kuribayashi; Ted Briscoe

GCG-Based Artificial Languages for Evaluating Inductive Biases of Neural Language Models

Nadine El-Naggar, Tatsuki Kuribayashi, Ted Briscoe

Published: 24 May 2025, Last Modified: 18 Jun 2025CoNLL 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: inductive bias, language models, linguistic typology

TL;DR: We introduce GCG-based artificial languages that cover previously overlooked constructions, such as VSO/OSV word orders and object relative clauses. We re-evaluate LM's inductive biases with our GCG-based data.

Abstract: Recent work has investigated whether extant neural language models (LMs) have an inbuilt inductive bias towards the acquisition of attested typologically-frequent grammatical patterns as opposed to infrequent, unattested, or impossible patterns using artificial languages (White and Cotterell, 2021; Kuribayashi et al., 2024). The use of artificial languages facilitates isolation of specific grammatical properties from other factors such as lexical or real-world knowledge, but also risks oversimplification of the problem. In this paper, we examine the use of Generalized Categorial Grammars (GCGs) (Wood, 2014) as a general framework to create artificial languages with a wider range of attested word order patterns, including those where the subject intervenes between verb and object (VSO, OSV) and unbounded dependencies in object relative clauses. In our experiments, we exemplify our approach by extending White and Cotterell (2021) and report some significant differences from existing results.

Copyright Agreement: pdf

Submission Number: 205

Loading