ILENS: Iterative Logical Enhancement via Neurosymbolic Computation and Common Sense

ILENS: Iterative Logical Enhancement via Neurosymbolic Computation and Common Sense

ACL ARR 2024 June Submission2881 Authors

15 Jun 2024 (modified: 09 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Trained on internet-scale datasets, large language models (LLMs) excel in tasks relying on surface patterns and exhibit strong common sense knowledge. However, their performance decreases on tasks requiring deeper reasoning steps. Recent techniques aim to combine the strengths of both reasoning programs and LLMs by converting natural language problems into formal logic specifications, thereby enhancing reasoning task performance. Despite these advancements, LLMs often struggle with ambiguities and complex cases, leading to reasoning errors in the formal method step. In this paper, based on the observation that LLMs can provide the implicit common sense facts when asked explicitly, we propose \textsc{iLens} (\textbf{I}terative \textbf{L}ogical \textbf{E}nhancement via \textbf{N}eurosymbolic Computation and Common \textbf{S}ense), a new iterative neurosymbolic system for logical inferences which integrates the two systems in an iterative manner. Initially, we translate the problem specifications into AMR graphs, and then convert them into first-order logic (FOL) expressions to minimize inaccurate interpretations from natural language to FOL. Subsequently, we use formal theorem provers (Prover9, Mace4) to deduce the conclusion. Within this process, we ask the theorem prover to generate counterexamples based on the given premises when the theorem prover fails to provide a definite answer, then prompting the LLM to identify any implicit common sense facts. These facts are then incorporated back into the theorem to attempt proof completion. Through the iterative steps and leveraging the GPT-4 API in conjunction with Prover9 and Mace4, our new proposed \textsc{iLens} system significantly reduces uncertain and error cases and achieves 80.22\% accuracy on the challenging FOLIO dataset, setting a new state of the art.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: commonsense reasoning, prompting, software and tools

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 2881

Loading