RTLSeek: Boosting the LLM-Based RTL Generation with Diversity-Oriented Reinforcement Learning

Zhiteng Chao; Xinyu Zhang; Yonghao Wang; Bin Sun; Tianyun Ma; Tianmeng Yang; Jing Justin Ye; Jianan Mu; Huawei Li

RTLSeek: Boosting the LLM-Based RTL Generation with Diversity-Oriented Reinforcement Learning

Zhiteng Chao, Xinyu Zhang, Yonghao Wang, Bin Sun, Tianyun Ma, Tianmeng Yang, Jing Justin Ye, Jianan Mu, Huawei Li

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Automatic RTL design, LLM, GRPO-based RL, Multi-Objective Reward

TL;DR: RTLSeek enhances LLM-generated RTL accuracy and diversity using Diversity-Oriented Reinforcement Learning. It incorporates expert feedback and a three-stage post-training framework, outperforming other dedicated models in benchmarks.

Abstract: Register Transfer Level (RTL) design translates high-level specifications into hardware using HDLs like Verilog. While LLM-based RTL generation holds promise, limited functionally verifiable high-quality data constrains its accuracy and diversity. Current SFT-based post-training generates one-to-one HDL code from natural language, lacking deep understanding of RTL variations for different goals. This paper proposes RTLSeek, a novel post-training paradigm that employs rule-based Diversity-Oriented Reinforcement Learning to improve RTL accuracy and diversity. We introduce a Diversity-Centric Multi-Objective Reward Scheduling that integrates expert knowledge and EDA feedback, along with a three-stage training framework to better utilize scarce data. Experiments show RTLSeek outperforms other dedicated models on RTLLM, with ablation studies validating its effectiveness.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 24081

Loading