Measuring Scarcity–Complexity Collision in Language Model Estimation

Measuring Scarcity–Complexity Collision in Language Model Estimation

ICLR 2026 Conference Submission13809 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: language models, causality, formal languages, evaluation, learnability

TL;DR: We introduce a methodology for causally evaluating architectures learnability of formal language properties

Abstract: Formal languages are increasingly used to analyze limitations of language–model architectures, via properties of their defining automata (e.g., number of states, transition weights, or out-degree at a state). Understanding why a neural language model struggles to learn a given property requires separating two cases: **scarcity**---the property is underrepresented in the data due to low likelihood---versus **complexity**---the task is more sample-demanding for the chosen learner. In the former case, increasing or resampling training data can help; in the latter, changes to the architecture or training may be needed. Evaluating a property “on its own” typically involves marginalizing over a family of languages that exhibit it, which can introduce selection bias: some family members may be systematically overrepresented among samples with the property, confounding correlational analyses. Indeed, most existing investigations relate corpus statistics to performance, but such correlations do not identify causal effects. We introduce a causal framework for assessing learnability in probabilistic formal languages. Using an efficient, controlled sampling procedure for regular languages, we *intervene on event frequencies by replacing the unconstrained sampling policy with a controlled one*, leaving the automaton’s topology and weights unchanged, to test whether a property looks difficult because it is rare or because it is genuinely demanding for the learner. Our main contribution is theoretical: the framework and controlled sampling procedures. We illustrate these in three example case studies with LSTM and Transformer learners, where conclusions under correlational evaluation can invert once causal frequency interventions are applied.n can invert once causal frequency interventions are applied.

Supplementary Material: zip

Primary Area: causal reasoning

Submission Number: 13809

Loading