Towards Fundamental Language Models: Evaluating Linguistic Competence Across Model Sizes

Towards Fundamental Language Models: Evaluating Linguistic Competence Across Model Sizes

ACL ARR 2025 February Submission2560 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Fundamental Language Models (FLMs) propose a novel paradigm that separates linguistic competence from factual knowledge to address critical challenges in current language models, including hallucinations, data privacy concerns, and training-induced biases. This paper investigates whether FLMs can maintain robust language processing capabilities while externalizing factual knowledge. Through comprehensive evaluation of linguistic competence across model sizes using specialized benchmarks, we assess lexical, grammatical, and semantic capabilities. We also analyze how model size affects both linguistic and factual knowledge encoding. Our findings demonstrate that linguistic competence stabilizes at relatively modest model sizes, while factual knowledge continues scaling with model size. These results provide empirical support for FLMs as a promising research direction, suggesting that future work could effectively balance language understanding with external knowledge retrieval.

Paper Type: Long

Research Area: Special Theme (conference specific)

Research Area Keywords: retrieval-augmented models, data influence, linguistic theories, reasoning, benchmarking

Languages Studied: English

Submission Number: 2560

Loading