Beyond Self-Refinement: Ensembling and Chaining for Neurosymbolic Reasoning

Published: 05 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: neurosymbolic reasoning, logic‑LLM, first‑order logic, pass@k, ensembling, chain‑of‑thought, Prover9, chaining
TL;DR: We show that self-refinement does not improve Logic‑LLM, while selection and chain‑to‑logic pass@k yield consistent gains up to 84.31% on FOLIO.
Abstract: Neurosymbolic reasoning systems combine neural language models with sym- bolic solvers to produce faithful logical inference. We investigate whether itera- tive refinement or diversity-based ensembling more effectively improves Logic- LLM on FOLIO. Using GPT-4 and Prover9, we find that solver-guided self- refinement does not improve accuracy in our runs, saturating at 77.94%. In con- trast, selection-based methods provide consistent gains: a hybrid selector over original and refined programs and an uncertainty-aware ensemble both reach 79.90%. We further propose a chain-to-logic pipeline that converts multiple rea- soning chains into logic programs and aggregates them via pass@k, achieving 84.31% accuracy at pass@3. Our results show that diversity and selective ensem- bling are more effective than iterative repair for improving neurosymbolic reason- ing.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 99
Loading