RefP2C: Reflective Paper-to-Code Development Enabled by Fine-Grained Verification

RefP2C: Reflective Paper-to-Code Development Enabled by Fine-Grained Verification

ICLR 2026 Conference Submission19213 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: paper-to-code development, reflection, fine-grained verification, agent

Abstract: Reproducing machine learning papers is essential for scientific progress but remains challenging for both humans and automated agents. Analyses from prior studies reveal that the most prevalent issues arise during the code development phase, which is the foundational first step towards successful reproduction. Specifically, within this phase, agents often struggle to fully and accurately replicate implementation details such as mathematical formulas and algorithmic logic. Previous studies further show that reflection with explicit feedback improves agent performance. However, current paper reproduction methods fail to effectively adopt this strategy. This gap mainly arises from the diverse paper patterns, complex method modules, and varied configurations encountered in research papers. Motivated by how humans use systematic checklists to efficiently review complex code, we propose \textbf{RefP2C}, a \textbf{Ref}lective \textbf{P}aper-\textbf{to}-\textbf{C}ode Development framework that automatically extracts a paper’s fingerprint-a comprehensive set of accurate and atomic criteria serving as high-quality supervisory signals. The framework first generates code based on the extracted information, and then leverages the fingerprint within iterative verification and refinement loop. This approach systematically detects discrepancies and produces targeted revisions to align generated code with the paper’s specifications. Extensive experiments on the PaperBench Code-Dev benchmark have been conducted, RefP2C achieves 13.0\% performance gap over baselines, and it correctly revises complex logical and mathematical criteria in reflecting, on which the effectiveness is obvious.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 19213

Loading