Strategies for Improving NL-to-FOL Translation with LLMs: Data Generation, Incremental Fine-Tuning, and Verification
Abstract: Logical reasoning is a fundamental task in natural language processing that presents significant challenges to Large Language Models (LLMs). While symbolic representations such as first-order logic (FOL) are well suited for logical reasoning, translating natural language (NL) into FOL often results in errors that are under-explored. We address this by categorizing the FOL translation errors in LLMs for deductive reasoning task and propose methods to improve translation quality, specifically for small (7B) language models. We introduce ProofFOL, a high-quality FOL-annotated subset of ProofWriter dataset created using GPT-4o. The models fine-tuned on this silver standard data outperform large (70B) language models. Additionally, for better data utilisation in data-scarce settings, we present an incremental framework that combines data augmentation with a novel symbolic translation verification. Augmentation generates additional training data by splitting (premises, conclusion) pairs, which when used for fine-tuning results in improved performance over the model fine-tuned on the original data. Our investigation of the translation errors leads to generation of a perturbation dataset consisting of simulated NL-to-FOL translation errors and their corresponding corrections, which is used to train a verifier to identify and correct potential syntactic and semantic FOL translation errors. Our approach leverages limited human-annotated data, achieving state-of-the-art results on the ProofWriter and ProntoQA datasets.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: pre-training, prompting, fine-tuning, logical reasoning, NLP datasets
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources, Data analysis
Languages Studied: English
Submission Number: 470
Loading