VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation

VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation

ICLR 2026 Conference Submission13910 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Hardware design, Reasoning Models, Electronic Design Automation

TL;DR: Novel framework combining reinforcement learning with reasoning for Verilog code generation achieves 83.1% correctness, outperforming GPT-4 Turbo and showing 2.8× improvement over baselines.

Abstract: Automating Register Transfer Level (RTL) code generation using Large Language Models (LLMs) offers substantial promise for streamlining digital circuit design and reducing human effort. However, current LLM-based approaches for RTL code generation face significant challenges. Methods such as supervised fine-tuning (SFT), in-context learning, and chain-of-thought (CoT) struggle with several critical limitations in the RTL domain: the scarcity of high-quality training data, poor alignment between natural language specifications and generated code, lack of built-in verification mechanisms, and difficulty balancing between model generalization and domain specialization. Inspired by groundbreaking research such as DeepSeek-R1, which combines reinforcement learning with reasoning capabilities, we introduce VeriReason, a comprehensive framework that integrates supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning specifically tailored for RTL code generation. Using our curated high-quality training examples alongside a feedback-driven reward model, VeriReason combines testbench evaluations with structural heuristics to improve specification-code alignment and eliminate hallucinations. Iterative GRPO embeds intrinsic self-checking and reasoning capabilities, enabling the model to autonomously detect and correct functional errors. On the VerilogEval Benchmark, VeriReason delivers significant improvements: achieving 83.1% functional correctness on the VerilogEval Machine benchmark, substantially outperforming both comparable-sized models and much larger commercial systems like GPT-4 Turbo. Additionally, our approach demonstrates up to a 2.8× increase in first-attempt functional correctness compared to baseline methods and exhibits robust generalization to unseen designs. To our knowledge, VeriReason represents the first system to successfully integrate explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis. The code is available at: https://anonymous.4open.science/r/VeriReason-E625.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 13910

Loading