Language Identification in the Limit with Computational Trace

ICLR 2026 Conference Submission16016 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: language identification, complexity theory
TL;DR: We define a theoretical model of language identification with CoT, where CoT is defined as having access to computational traces, and we show that with this extra information we can learn Turing machines, thus circumventing classical lower bounds.
Abstract: Training on Chain-of-Thought (CoT) traces has empirically shown to dramatically improve the capabilities of Large Language Models (LLMs), yet a formal understanding of its power remains limited. In this work, we investigate the role of training on such computational traces from the perspective of language learnability. We introduce a new learning model, identification in the limit with trace, which augments Gold's classic paradigm [Gold'67] by providing the learner not only with examples from a target language but also with computational traces from the machine that accepts them. Our results reveal that access to these traces dramatically enhances the power of the learner. We first prove that with perfect computational traces, the class of all computable languages (those recognizable by Turing Machines) becomes identifiable in the limit. This stands in sharp contrast to Gold's famous impossibility result, which holds even for the simple class of languages that are recognizable by deterministic finite automata. We then analyze the more challenging scenario where the learner has only partial information regarding the computational traces, which are also subject to adversarial corruptions. In this setting, we establish a set of trichotomic results on the amount of error that can be tolerated for the successful identification of language classes across the Chomsky hierarchy.
Primary Area: learning theory
Submission Number: 16016
Loading