Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs

Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs

ACL ARR 2024 April Submission120 Authors

14 Apr 2024 (modified: 29 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Language models learn rare syntactic phenomena, but it has been argued that they rely on rote memorization, as opposed to grammatical generalization.Training on a corpus of human-scale in size (100M words), we iteratively trained transformer language models on systematically manipulated corpora and then evaluated their learning of a particular rare grammatical phenomenon: the English Article+Adjective+Numeral+Noun (AANN) construction ("a beautiful five days"). We compared how well this construction was learned on the default corpus relative to a counterfactual corpus in which the AANN sentences were removed. AANNs were still learned better than systematically perturbed variants of the construction. Using additional counterfactual corpora, we suggest that this learning occurs through generalization from related constructions (e.g., ``a few days''). An additional experiment showed that this learning is enhanced when there is more variability in the input. Taken together, our results provide an existence proof that models can learn rare grammatical phenomena by generalization from less rare phenomena. Code will be available at (url).

Paper Type: Long

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Research Area Keywords: linguistic theories

Contribution Types: Model analysis & interpretability

Languages Studied: English

Section 2 Permission To Publish Peer Reviewers Content Agreement: Authors decline to grant permission for ACL to publish peer reviewers' content

Submission Number: 120

Loading