Transforming Language Models into Program Interpreters via Execution Trace Chain of Thought

Transforming Language Models into Program Interpreters via Execution Trace Chain of Thought

TMLR Paper6779 Authors

02 Dec 2025 (modified: 14 Jan 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Code execution reasoning (CER), the ability to predict how code executes on a given input, has been added to the expected aspects of language models' (LMs') coding capabilities. However, many open-source LMs perform poorly on simple code snippets and, as our observations show, they exhibit limitations even on a single basic operation. To enable LMs to accumulate fine-grained reasoning steps in a structured format, we propose leveraging extremely granular execution traces as chain-of-thought rationales. Specifically, we introduce a fine-tuning method called ET-CoT (Execution Trace Chain of Thought), which leverages execution traces generated by our custom code interpreter and characterized by sub-line-level, thorough expansion of all expressions, going beyond merely logging intermediate variables. After fine-tuning with 127k examples, ET-CoT consistently improves CER performance across models and benchmarks, for instance with Qwen2.5-7B-Instruct outperforming its official Coder model. In addition, our custom tests show improved accuracy on repeated application of simple operations. Overall, ET-CoT serves as a unique approach that provides strong baselines and insights for improving CER performance.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Varun_Kanade1

Submission Number: 6779

Loading