Draft-and-Prune Improves Auto-formalization in Logical Reasoning

Draft-and-Prune Improves Auto-formalization in Logical Reasoning

ACL ARR 2026 January Submission6586 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Logical Reasoning, Neuro-Symbolic AI, Auto-formalization

Abstract: Auto-formalization (AF) translates natural-language reasoning problems into solver-executable programs, enabling symbolic solvers to perform sound logical deduction. In practice, AF pipelines are brittle: programs may fail to execute, or execute but encode incorrect semantics. We propose Draft-and-Prune (D\&P), an inference-time framework that improves AF-based logical reasoning via diversity and verification. D\&P first drafts a natural-language plan and conditions program generation on it. It further prunes executable but contradictory or ambiguous formalizations, and aggregates surviving predictions by majority vote. Across four benchmarks (AR-LSAT, ProofWriter, PrOntoQA, LogicalDeduction), D\&P substantially strengthens the AF pathway without extra supervision. On AR-LSAT, it achieves $77.30\%$ accuracy in the AF-only setting and $81.48\%$ with a CoT fallback (GPT-4o), outperforming the strongest AF baseline CLOVER by $30.5$ and $18.7$ points. also attains near-ceiling performance on the other benchmarks, including 100% on PrOntoQA and LogicalDeduction under our setup.

Paper Type: Long

Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning

Research Area Keywords: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study

Languages Studied: First-Order Logic, Logic Programs

Submission Number: 6586

Loading