Distilling SMT Solver Reasoning into Compact Language Models

Emre Kıyak; Cagatay Cingoz; Hakan Çapuk; Aykut Erdem

Distilling SMT Solver Reasoning into Compact Language Models

Emre Kıyak, Cagatay Cingoz, Hakan Çapuk, Aykut Erdem

Published: 01 Apr 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 10 pages)

Keywords: Reasoning, SMT Solver, Transfer Learning, Large Language Models

TL;DR: We show that compact LMs learn SMT-style reasoning via solver-trace distillation, improving in-distribution performance and generalizing out-of-distribution.

Abstract: We conduct an extensive and comparative analysis of compact language models with fewer than 10B parameters in the domain of Satisfiability Modulo Theories (SMT). While recent works have shown that billion-parameter models can perform logical reasoning, their application to formal SMT solvers like Z3 \cite{de2008z3} remains underexplored, particularly regarding the trade-off between translation (parsing Natural Language to SMT) and execution (imitating solver reasoning traces). Additionally, while state-of-the-art commercial LLMs can perform strongly on SMT-type problems, the accuracies of smaller models on such tasks remain particularly low due to their limited capacity in arithmetic reasoning tasks. We explore whether it is possible to create a new lightweight language model that can perform well on SMT-based arithmetic and boolean reasoning tasks. Our contribution is a rigorous evaluation of whether resource-constrained models can serve as reliable interfaces for formal verification tools, either by approximating the internal reasoning steps of a solver and accurately predicting the satisfiability of the problem without calling external tools or by correctly formalizing natural language constraints and parsing them correctly in an SMT problem format that a solver like Z3 can solve. Additionally, we evaluate how supervised solver guided training transfers to a related real-world analytical reasoning task in order to assess the limitations of our approach.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 210

Loading