Linguistic Competence and Performance for LLMs: the Case of Syntactic Center Embedding

Linguistic Competence and Performance for LLMs: the Case of Syntactic Center Embedding

ACL ARR 2024 August Submission90 Authors

13 Aug 2024 (modified: 15 Sept 2024)ACL ARR 2024 August SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We consider syntactic center embedding, where an embedding phrase contains material on both sides of the embedded phrase. While a single center embedding is easily understandable for human language users, this is generally not the case for multiple center embeddings. Despite this, it is a standard view in linguistic theory that multiple center embeddings are grammatically acceptable -- human linguistic competence includes this ability, but this is obscured by performance limitations. We construct sentences with center embeddings of varying levels, ranging from 1-4, and we find that GPT-4 achieves nearly perfect results even with 3 or 4 levels of embeddings. Other LLMs show a sharp drop in accuracy above level 1. We suggest that this is because GPT-4 has successfully learned the same underlying linguistic competence as humans, while not being subject to the same performance limitations. This would mean that human linguistic competence is more clearly observed in GPT-4 than in humans.

Paper Type: Short

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Research Area Keywords: linguistic theories, cognitive modeling, computational psycholinguistics

Contribution Types: Model analysis & interpretability, Theory

Languages Studied: English

Submission Number: 90

Loading