Keywords: model counting, large language models, sudoku, automated reasoning, satisfiability
Abstract: Model counting is a fundamental problem in computer science with several applications, ranging from mutation modeling in DNA to statistical physics. Instead of finding one solution for a task at hand, in model counting we want to compute the exact number of different solutions for this task. While large language models (LLMs) have impressive performance in different reasoning tasks, their effectiveness has been focused in the context of optimization and decision problems. In this paper, we bridge this gap studying the capabilities of LLMs in model counting for combinatorial problems. Using the popular Sudoku puzzle as an illustrative example, we evaluate how good LLMs are in counting the number of solutions for different Sudoku puzzles. We show that, despite having decent performance at first, LLMs are fragile to modifications in the problem encoding. On top of that, we also study how different representations of the problem impact performance. In particular, we preprocess our formulas and transform them into d-DNNF formulas. This is an important fragment of propositional logic for which the model counting problem is trivial. With this simple fragment, the performance of the reasoning LLMs improves, indicating that they might be capable of counting simpler problems with chain-of-thoughts, although not consistently. Last, we also study whether LLMs can generate Python programs that compute the exact model counts. Unsurprisingly, while LLMs struggle to count by themselves, they are much more reliable when creating code to do this job.
Supplementary Material: zip
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 13397
Loading