Keywords: deep-research coding agent paper to code
Abstract: Recent Large Language Models (LLMs) demonstrate strong code generation capabilities, however, they often fall short in translating complex, multi-component research methodologies into a coherent, functional codebase and automatic repository-level code synthesis from research papers remains a formidable challenge. Despite promising results from current paper reproduction agents, particularly their efficiency in generating code repositories from scratch, their reliance on staged prompt engineering falls short for complex implementation tasks. In this paper, we propose a multi-agent framework for automated paper reproduction, leveraging a combination of deep research mechanisms, a long-short term memory architecture, and modular generation strategies driven by Large Language Models (LLMs). Our system employs a structured workflow where specialized agents autonomously decompose complex implementation tasks into manageable sub-tasks, thereby facilitating efficient and scalable code synthesis. The experimental evaluation on PaperBench demonstrates the state-of-the-art performance in the implementation of automated research papers, achieving a Replication Score of 63.2\%.
Supplementary Material: pdf
Submission Number: 58
Loading