Keywords: Machine Learning for Systems; Verilog Retrieval; Hardware Description Languages (HDL); Knowledge Graphs; Graph-based Retrieval; SystemVerilog Code Analysis; Electronic Design Automation (EDA); Code Representation Learning; Semantic Search for RTL; Program Analysis and Optimization; AI for Hardware Design; Graph-based Code Understanding; Retrieval-Augmented Generation (RAG) for Systems; Hardware–Software Co-design;
TL;DR: We introduce a knowledge-graph–based retrieval system that outperforms standard RAG on Verilog repositories, raising file-level recall from 31% to 55–79%.
Abstract: We present a retrieval system for answering questions about Verilog / System Ver-
ilog code bases. Standard vector RAG (retrieval augmented generation) often fails
on hardware description languages due to identifier renaming, coding-style vari-
ation, hierarchy, and concurrency. We instead construct knowledge graphs over
the code and its LLM-generated explanations and retrieve based on the entities
and relations. We achieve this by adapting the GraphRAG package, originally
intended for natural language, to our specific code use-case. We compare (i)
standard semantic retrieval on the explanations, (ii) GraphRAG over code and
(iii) GraphRAG over the explanations. On a corpus of ∼3.5K files and a bench-
mark of 29 questions, using top-1 file-level recall, the first baseline reaches 31%.
GraphRAG consistently outperforms it, achieving 55–59% when utilizing the ex-
planations, and up to 79% when considering retrieved equivalent files. Construct-
ing the graph with GPT-4o-mini worked well without requiring the larger GPT-
4o, but GPT-4o was required for answering the queries better. Our results indicate
that the suggested graph-based approach could be useful for answering questions
of hardware designers on the code base.
Submission Number: 26
Loading