VERT: A SystemVerilog Assertion Dataset to Improve Hardware Verification with LLMs

25 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hardware Verification, Large Language Models, SystemVerilog
TL;DR: This paper introduces VERT, a novel dataset that automates SystemVerilog assertion generation for hardware verification using Large Language Models.
Abstract: Hardware verification is a critical step in the modern System-on-Chip (SoC) design cycle, consuming approximately 70% of development time. SystemVerilog assertions are pivotal in the verification process, ensuring that designs function as intended. However, existing industrial practices rely on manual assertion generation, which becomes increasingly untenable as hardware systems become complex. Recent research has explored the potential of Large Language Models (LLMs) to automate the hardware verification process, reducing human intervention. Despite this, State-of-the-Art (SOTA) proprietary models, such as OpenAI's GPT-4o, have shown limitations in generating accurate assertions and require costly licenses and restricted usage. While smaller, open-source LLMs offer a more accessible option, they require fine-tuning to handle the complexities of the source code and generate accurate assertions. This highlights the need for a dataset that enables these models to achieve superior performance compared to SOTA LLMs. To this end, we present VERT, a dataset designed to improve the generation of SystemVerilog assertions using LLMs. Our dataset empowers researchers and hardware corporations to fine-tune smaller, open-source LLMs, surpassing larger proprietary models such as GPT-4 in accuracy and efficiency. Furthermore, VERT eliminates the need for expensive licenses and ensures data privacy through local fine-tuning, providing a scalable, cost-effective solution for automated hardware verification. To curate the dataset, we systematically compile and augment variables from open-source hardware description languages (HDL), generating conditions to create synthetic code snippets paired with corresponding assertions. We show that smaller, open-source LLMs, such as Deepseek Coder 6.7B and Llama 3.1 8B, when fine-tuned on VERT, outperform OpenAI's GPT-4o in assertion generation. The assertions generated by the fine-tuned models are evaluated on industry-standard platforms, including OpenTitan, CVA6, and Pulpissimo SoCs, demonstrating up to a 96.88% improvement in both functional and syntactical correctness compared to the base models and up to 24.14% when compared to GPT-4o. This demonstrates the prowess of VERT in enabling researchers to potentially reduce the overhead and human error associated with manual assertion generation, offering a scalable solution for industry-grade hardware designs. The dataset is available at https://anonymous.4open.science/r/VERT-4D6D/.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5195
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview