LingGym: How Far Are LLMs from Thinking Like Field Linguists?

LingGym: How Far Are LLMs from Thinking Like Field Linguists?

ACL ARR 2025 May Submission4463 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper introduces LingGym, a new benchmark that evaluates LLMs’ capacity for meta-linguistic reasoning using Interlinear Glossed Text (IGT) and grammatical descriptions extracted from 18 typologically diverse reference grammars. Unlike previous work that focuses on specific downstream tasks, we assess whether LLMs can generalize linguistic inference across low-resource languages and structures not seen during training. We present a controlled evaluation task: Morpheme-Gloss Inference, in which the model must infer a missing morpheme and gloss from context using varying levels of linguistic information (e.g., glosses, grammatical explanations, translations). Our results show that incorporating structured linguistic cues leads to consistent improvements in reasoning performance across all models. This work highlights both the promise and current limitations of using LLMs for typologically informed linguistic analysis and low-resource language documentation.

Paper Type: Long

Research Area: Multilingualism and Cross-Lingual NLP

Research Area Keywords: multilingual benchmarks, less-resourced languages, endangered languages, language documentation, resources for less-resourced languages

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: Fwe,Gyeli,Ik,Japhug,Kagayanen,Kalamang,Komnzo,Mauwake,Mehweb,Moloko,Palula,Papuan Malay,Pichi,Rapa Nui,Tuatschin,Ulwa,Vamale,Yauyos Quecha

Keywords: multilingual benchmarks, less-resourced languages, endangered languages, language documentation, resources for less-resourced languages

Submission Number: 4463

Loading