Track: long paper (up to 10 pages)
Keywords: neurosymbolic reasoning, logic‑LLM, first‑order logic, pass@k, ensembling, chain‑of‑thought, Prover9, chaining
TL;DR: We show that self-refinement does not improve Logic‑LLM, while selection and chain‑to‑logic pass@k yield consistent gains up to 84.31% on FOLIO.
Abstract: Neurosymbolic reasoning systems combine neural language models with sym-
bolic solvers to produce faithful logical inference. We investigate whether itera-
tive refinement or diversity-based ensembling more effectively improves Logic-
LLM on FOLIO. Using GPT-4 and Prover9, we find that solver-guided self-
refinement does not improve accuracy in our runs, saturating at 77.94%. In con-
trast, selection-based methods provide consistent gains: a hybrid selector over
original and refined programs and an uncertainty-aware ensemble both reach
79.90%. We further propose a chain-to-logic pipeline that converts multiple rea-
soning chains into logic programs and aggregates them via pass@k, achieving
84.31% accuracy at pass@3. Our results show that diversity and selective ensem-
bling are more effective than iterative repair for improving neurosymbolic reason-
ing.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 99
Loading