Artificial Incorrectness: SMT and LLMs in Hardware Synthesis
Abstract: The adoption of large language models (LLMs) in hardware design automation poses correctness risks for safety-critical applications.
We systematically evaluate LLMs against Satisfiability Modulo Theories (SMT) solvers across three hardware synthesis tasks, revealing that
LLMs achieve lower levels of functional correctness compared to SMT approaches in our benchmarks. Our findings reveal a crucial distinction: whilst SMT solvers can excel at direct synthesis and can exhaustively validate LLM outputs, their counterexample feedback fails to improve LLM performance. This demonstrates that effective validation does not translate to effective improvement guidance for LLMs, establishing formal methods as essential for direct synthesis and a need for better iterative refinement methods in reliable AI-assisted hardware design.
Loading