Track: long paper (up to 10 pages)
Keywords: Reasoning, SMT Solver, Transfer Learning, Large Language Models
TL;DR: We show that compact LMs learn SMT-style reasoning via solver-trace distillation, improving in-distribution performance and generalizing out-of-distribution.
Abstract: We conduct an extensive and comparative analysis of compact language models with fewer than 10B parameters in the domain of Satisfiability Modulo Theories (SMT). While recent works have shown that billion-parameter models can perform logical reasoning, their application to formal SMT solvers like Z3 \cite{de2008z3} remains underexplored, particularly regarding the trade-off between translation (parsing Natural Language to SMT) and execution (imitating solver reasoning traces). Additionally, while state-of-the-art commercial LLMs can perform strongly on SMT-type problems, the accuracies of smaller models on such tasks remain particularly low due to their limited capacity in arithmetic reasoning tasks. We explore whether it is possible to create a new lightweight language model that can perform well on SMT-based arithmetic and boolean reasoning tasks. Our contribution is a rigorous evaluation of whether resource-constrained models can serve as reliable interfaces for formal verification tools, either by approximating the internal reasoning steps of a solver and accurately predicting the satisfiability of the problem without calling external tools or by correctly formalizing natural language constraints and parsing them correctly in an SMT problem format that a solver like Z3 can solve. Additionally, we evaluate how supervised solver guided training transfers to a related real-world analytical reasoning task in order to assess the limitations of our approach.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 210
Loading