Keywords: Large Language Models, Logical Reasoning, Neuro-Symbolic AI, Auto-formalization
Abstract: Auto-formalization (AF) translates natural-language reasoning problems into solver-executable programs, enabling symbolic solvers to perform sound logical deduction.
In practice, AF pipelines are brittle: programs may fail to execute, or execute but encode incorrect semantics.
We propose Draft-and-Prune (D\&P), an inference-time framework that improves AF-based logical reasoning via diversity and verification.
D\&P first drafts a natural-language plan and conditions program generation on it.
It further prunes executable but contradictory or ambiguous formalizations, and aggregates surviving predictions by majority vote.
Across four benchmarks (AR-LSAT, ProofWriter, PrOntoQA, LogicalDeduction), D\&P substantially strengthens the AF pathway without extra supervision.
On AR-LSAT, it achieves $77.30\%$ accuracy in the AF-only setting and $81.48\%$ with a CoT fallback (GPT-4o), outperforming the strongest AF baseline CLOVER by $30.5$ and $18.7$ points.
also attains near-ceiling performance on the other benchmarks, including 100% on PrOntoQA and LogicalDeduction under our setup.
Paper Type: Long
Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning
Research Area Keywords: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study
Languages Studied: First-Order Logic, Logic Programs
Submission Number: 6586
Loading