HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution

HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution

ACL ARR 2026 January Submission10149 Authors

06 Jan 2026 (modified: 07 Jun 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: paper replication, AI/LLM agents, hierarchical multi-agent system, prompt-based evaluation

Abstract: Recent advances in large language models have highlighted their potential to automate computational research, particularly reproducing experimental results. However, existing approaches still use fixed sequential agent pipelines with weak global coordination, which limits their robustness and overall performance. In this work, we propose Hierarchical Research Agent System (HiRAS), a hierarchical multi-agent framework for end-to-end paper reproduction that employs supervisory manager agents to coordinate specialised agents across fine-grained stages. We also identify limitations in the reference-free evaluation of the Paper2Code benchmark and introduce Paper2Code-Extra (P2C-Ex), a refined protocol that incorporates repository-level information and better aligns with the original reference-based metric. We conduct extensive evaluation, validating the effectiveness and robustness of our proposed methods, and observing improvements, including >10\% relative performance gain above the previous state-of-the-art using open-source backbone models and significantly reduced hallucination in the evaluation. All code and data will be made publicly available.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: LLM/AI agents, architectures, automatic evaluation

Contribution Types: NLP engineering experiment, Reproduction study

Languages Studied: English, Programming Languages

Submission Number: 10149

Loading