Keywords: GenBen; Benchmark; LLM-Aided Design; LLM; Hardware Design
TL;DR: An Open Source Benchmark for LLM-Aided Hardware Design
Abstract: This paper introduces GenBen, a generative benchmark designed to evaluate the capabilities of large language models (LLMs) in hardware design. With the rapid advancement of LLM-aided design (LAD), it has become crucial to assess the effectiveness of these models in automating hardware design processes.
Existing benchmarks primarily focus on hardware code generation and often neglect critical aspects such as Quality-of-Result (QoR) metrics, design diversity, modality, and test set contamination. GenBen is the first open-source, generative benchmark tailored for LAD that encompasses a range of tasks, from high-level architecture to low-level circuit optimization, and includes diverse, silicon-proven hardware designs.
We have also designed a difficulty tiering mechanism to provide fine-grained insights into enhancements of LLM-aided designs. Through extensive evaluations of several state-of-the-art LLMs using GenBen, we reveal their strengths and weaknesses in hardware design automation. Our findings are based on 10,920 experiments and 2,160 hours of evaluation, underscoring the potential of this work to significantly advance the LAD research community.
In addition, both GenBen employs an end-to-end testing infrastructure to ensure consistent and reproducible results across different LLMs. The benchmark is available at https://anonymous.4open.science/r/GENBEN-2812.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 14237
Loading