HDL-FixBench: A Verifiable Repository-Level Benchmark for Hardware bug repair

Fan Cui; Hongyuan Hou; Zizhang Luo; Chenyun Yin; Wang Rao; Yun Liang

HDL-FixBench: A Verifiable Repository-Level Benchmark for Hardware bug repair

Fan Cui, Hongyuan Hou, Zizhang Luo, Chenyun Yin, Wang Rao, Yun Liang

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Hardware Engineering, Large Language Model, Electronic Design Automation, Benchmark

TL;DR: We introduce HDL-FixBench, the first benchmark for evaluating LLMs on repository-level hardware bug repair tasks

Abstract: Existing benchmarks for hardware design primarily assess Large Language Models (LLMs) on isolated, component-level Hardware Description Language (HDL) code generation from specifications, overlooking the critical challenge of repository-scale bug repair. To address this gap, we introduce HDL-FixBench, the first benchmark for repository-level hardware bug repair. It comprises 57 high-fidelity instances curated from three industry-standard open-source hardware projects: OpenTitan, CVA6, and Ibex. Each instance is curated through a rigorous methodology, combining a novel agent-based filtering pipeline with meticulous manual verification, and is accompanied by a fully reproducible, containerized EDA environment to ensure task quality and relevance. Evaluating seven state-of-the-art LLMs with two prominent agent frameworks(SWE-Agent and OpenHands) on HDL-FixBench, we find that even the most advanced models perform significantly worse than on SWE-bench Verified, with the top-performing model resolving only 40.3\% of tasks. This finding highlights the unique complexities of hardware engineering and establishes HDL-FixBench as a challenging and crucial benchmark for advancing the next generation of automated hardware design and verification tools.

Primary Area: datasets and benchmarks

Submission Number: 24163

Loading