RTLSeek: Boosting the LLM-Based RTL Generation with Diversity-Oriented Reinforcement Learning

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Automatic RTL design, LLM, GRPO-based RL, Multi-Objective Reward
TL;DR: RTLSeek enhances LLM-generated RTL accuracy and diversity using Diversity-Oriented Reinforcement Learning. It incorporates expert feedback and a three-stage post-training framework, outperforming other dedicated models in benchmarks.
Abstract: Register Transfer Level (RTL) design translates high-level specifications into hardware using HDLs like Verilog. While LLM-based RTL generation holds promise, limited functionally verifiable high-quality data constrains its accuracy and diversity. Current SFT-based post-training generates one-to-one HDL code from natural language, lacking deep understanding of RTL variations for different goals. This paper proposes RTLSeek, a novel post-training paradigm that employs rule-based Diversity-Oriented Reinforcement Learning to improve RTL accuracy and diversity. We introduce a Diversity-Centric Multi-Objective Reward Scheduling that integrates expert knowledge and EDA feedback, along with a three-stage training framework to better utilize scarce data. Experiments show RTLSeek outperforms other dedicated models on RTLLM, with ablation studies validating its effectiveness.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 24081
Loading