Learnability of Indirect Evidence in Language Models

ACL ARR 2024 June Submission4711 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: What kinds of and how much data is necessary for language models to acquire grammatical knowledge to judge sentence acceptability? Recent language models still have much room for improvement in their data efficiency compared to humans. In this paper, we investigate whether language models efficiently use indirect data (indirect evidence), from which they infer sentence acceptability. In contrast, humans use indirect evidence efficiently, which is considered one of the inductive biases contributing to efficient language acquisition. To explore this question, we inject synthetic instances with newly coined "wug" words into pretraining data and explore the model's behavior on evaluation data that assess grammatical acceptability regarding those words. We prepare the injected instances by varying their levels of indirectness and quantity. Our experiments surprisingly show that language models do not acquire grammatical knowledge even after repeated exposure to instances with the same structure but differing only in lexical items from evaluation instances in certain language phenomena. Our findings suggest a potential direction for future research: developing models that use latent indirect evidence to acquire grammatical knowledge.
Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: computational psycholinguistics
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 4711
Loading