Abstract: While formal language is well-suited for deductive logical reasoning with language models (LM), translating natural language (NL) into first-order logic (FOL) often results in errors that are not explicitly addressed by the literature. In the absence of larger scale NL-FOL translation data, we introduce ProofFOL, a high-quality FOL-annotated dataset containing 10.4k examples, with multiple premises and a conclusion. Additionally, we categorize the FOL translation errors made by LMs and highlight a significant reduction in translation error from models trained on \textsc{ProofFOL}. The improvement in FOL translation also boosts performance in downstream reasoning by the LMs on several logical reasoning benchmarks.
Paper Type: Short
Research Area: Language Modeling
Research Area Keywords: pre-training, prompting, fine-tuning, logical reasoning, NLP datasets
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English
Submission Number: 443
Loading