Artificial Incorrectness: SMT and LLMs in Hardware Synthesis

Edward Wang, Joe Walston, Luca Daniel, Tony Tan, Yoni Zohar, Clark Barrett

Published: 06 May 2026, Last Modified: 14 May 2026NFM 2026EveryoneCC BY 4.0

Abstract: The adoption of large language models (LLMs) in hardware design automation poses correctness risks for safety-critical applications. We systematically evaluate LLMs against Satisfiability Modulo Theories (SMT) solvers across three hardware synthesis tasks, revealing that LLMs achieve lower levels of functional correctness compared to SMT approaches in our benchmarks. Our findings reveal a crucial distinction: whilst SMT solvers can excel at direct synthesis and can exhaustively validate LLM outputs, their counterexample feedback fails to improve LLM performance. This demonstrates that effective validation does not translate to effective improvement guidance for LLMs, establishing formal methods as essential for direct synthesis and a need for better iterative refinement methods in reliable AI-assisted hardware design.