DeepCode: Open Agentic Coding

ACL ARR 2026 January Submission9371 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Agentic-Coding, LLM agent, Large language model
Abstract: Recent advances in Large Language Models (LLMs) have enabled the shift from coding assistants to autonomous software engineers. However, high-fidelity document-to-codebase synthesis—such as reproducing scientific papers—remains challenging due to the fundamental conflict between information overload and the finite context constraints of LLMs. In this work, we introduce DeepCode, a fully autonomous framework that addresses this challenge through principled information-flow management. By treating repository synthesis as a channel optimization problem, DeepCode maximizes task-relevant signals under strict context budgets via four orchestrated operations: source compression via blueprint distillation, structured indexing using stateful memory, conditional knowledge injection via retrieval-augmented generation, and closed-loop error correction. Extensive evaluations on PaperBench demonstrate that DeepCode achieves state-of-the-art performance, decisively outperforming leading commercial agents and, notably, surpassing PhD-level human experts on key reproduction metrics. Our source code is available at: https://anonymous.4open.science/r/DeepCode-C464.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: AI / LLM Agents
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 9371
Loading