Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning

TMLR Paper9535 Authors

05 Jun 2026 (modified: 20 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) increasingly exhibit strong reasoning abilities, often attributed to their capacity to generate chain-of-thought-style intermediate reasoning. Recent work suggests that exposure to code can further enhance these skills, but existing studies largely treat code as a generic training signal, leaving open the question of which properties of code actually contribute to improved reasoning. To address this gap, we study the structural complexity of code, which captures control flow and compositional structure that may shape how models internalise multi-step reasoning during fine-tuning. We examine two complementary settings: solution-driven complexity, where structural complexity varies across multiple solutions to the same problem, and problem-driven complexity, where structural complexity reflects variation in the underlying tasks. Using cyclomatic complexity and logical lines of code to construct controlled fine-tuning datasets, we evaluate a range of open-weight LLMs on diverse reasoning benchmarks. Our findings show that although code can improve reasoning, its usefulness is substantially shaped by structural properties. In 83% of experiments, restricting fine-tuning data to a specific structural complexity range outperforms training on structurally diverse code, pointing to a data-centric path for improving reasoning beyond scaling.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ruoyu_Sun1
Submission Number: 9535
Loading