Retrieval on Verilog Repositories: A Knowledge-Graph Based Solution

Published: 30 Oct 2025, Last Modified: 04 Nov 2025MLForSys2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Learning for Systems; Verilog Retrieval; Hardware Description Languages (HDL); Knowledge Graphs; Graph-based Retrieval; SystemVerilog Code Analysis; Electronic Design Automation (EDA); Code Representation Learning; Semantic Search for RTL; Program Analysis and Optimization; AI for Hardware Design; Graph-based Code Understanding; Retrieval-Augmented Generation (RAG) for Systems; Hardware–Software Co-design;
TL;DR: We introduce a knowledge-graph–based retrieval system that outperforms standard RAG on Verilog repositories, raising file-level recall from 31% to 55–79%.
Abstract: We present a retrieval system for answering questions about Verilog / System Ver- ilog code bases. Standard vector RAG (retrieval augmented generation) often fails on hardware description languages due to identifier renaming, coding-style vari- ation, hierarchy, and concurrency. We instead construct knowledge graphs over the code and its LLM-generated explanations and retrieve based on the entities and relations. We achieve this by adapting the GraphRAG package, originally intended for natural language, to our specific code use-case. We compare (i) standard semantic retrieval on the explanations, (ii) GraphRAG over code and (iii) GraphRAG over the explanations. On a corpus of ∼3.5K files and a bench- mark of 29 questions, using top-1 file-level recall, the first baseline reaches 31%. GraphRAG consistently outperforms it, achieving 55–59% when utilizing the ex- planations, and up to 79% when considering retrieved equivalent files. Construct- ing the graph with GPT-4o-mini worked well without requiring the larger GPT- 4o, but GPT-4o was required for answering the queries better. Our results indicate that the suggested graph-based approach could be useful for answering questions of hardware designers on the code base.
Submission Number: 26
Loading