FCoReBench: Can Large Language Models Solve Challenging First-Order Combinatorial Reasoning Problems?

Chinmay Mittal; Krishna Kartik; Parag Singla; Mausam .

FCoReBench: Can Large Language Models Solve Challenging First-Order Combinatorial Reasoning Problems?

Chinmay Mittal, Krishna Kartik, Parag Singla, Mausam .

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: llms, logical-reasoning, first-order-reasoning, neuro-symbolic

TL;DR: We introduce a dataset for first-order combinatorial reasoning and propose a method to integrate LLMs with symbolic solvers through programs, we show significant performance improvements on our dataset and effectiveness on other benchmarks.

Abstract: Can the large language models (LLMs) solve challenging first-order combinatorial reasoning problems such as graph coloring, knapsack, and cryptarithmetic? By first-order, we mean these problems can be instantiated with potentially an infinite number of problem instances of varying sizes. They are also challenging being NP-hard and requiring several reasoning steps to reach a solution. While existing work has focused on coming up with datasets with hard benchmarks, there is limited work which exploits the first-order nature of the problem structure. To address this challenge, we present FCoReBench, a dataset of 40 such challenging problems, along with scripts to generate problem instances of varying sizes and automatically verify and generate their solutions. We first observe that LLMs, even when aided by symbolic solvers, perform rather poorly on our dataset, being unable to leverage the underlying structure of these problems. We specifically observe a drop in performance with increasing problem size. In response, we propose a new approach, SymPro-LM, which combines LLMs with both symbolic solvers and program interpreters, along with feedback from a few solved examples, to achieve huge performance gains. Our proposed approach is robust to changes in the problem size, and has the unique characteristic of not requiring any LLM call during inference time, unlike earlier approaches. As an additional experiment, we also demonstrate SymPro-LM’s effectiveness on other logical reasoning benchmarks.

Supplementary Material: zip

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 14114

Loading