The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Zhouqi Hua; Wenwei Zhang; Chengqi Lyu; Yuzhe Gu; Songyang Gao; Kuikun Liu; Dahua Lin; Kai Chen

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Zhouqi Hua, Wenwei Zhang, Chengqi Lyu, Yuzhe Gu, Songyang Gao, Kuikun Liu, Dahua Lin, Kai Chen

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Length Generalization, Large Language Models, Turing Machine, Chain-of-Thought, Computable Reasoning, Synthetic Dataset

TL;DR: Train LLMs to imitate Turing machines for universal and effective length generalization on a challenging synthetic dataset of 18 tasks across 8 algorithmic classes.

Abstract: Length generalization, the ability to solve problems of longer sequences than those observed during training, poses a core challenge of Transformer-based large language models (LLMs). Although existing studies have predominantly focused on data-driven approaches for particular arithmetic operations or symbolic manipulation tasks, these approaches tend to be task-specific with limited performance on individual tasks. To pursue a more general solution, this paper focuses on a broader case of reasoning problems that are *computable*, *i.e.*, problems that algorithms can solve, thus can be solved by the Turing machine, which operates over inputs of unbounded length. From this perspective, this paper proposes **T**uring m**A**chine **I**mitation **L**earning (**TAIL**) to improve the length generalization ability of LLMs. TAIL uses computer programs to directly synthesize chain-of-thought (CoT) data that imitate the execution process of a Turing machine, which *linearly* expands the reasoning steps into *atomic* states to alleviate shortcut pattern learning and explicit *memory* fetch mechanism to reduce the difficulties of dynamic and long-range data access. To validate the universality and reliability of TAIL, we construct a challenging synthetic dataset covering 8 classes of algorithms and 18 tasks. Without bells and whistles, TAIL significantly improves the length generalization ability as well as the performance of Qwen2.5-7B in individual tasks using only synthetic data, surpassing previous methods and DeepSeek-R1. The experimental results reveal that the key concepts in the Turing machine, instead of the human-like thinking styles, are indispensable for TAIL for length generalization, through which the model exhibits read-and-write behaviors consistent with the properties of the Turing machine in their attention layers. This work provides a promising direction for future research in the learning of LLM reasoning from synthetic data.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 1453

Loading