Abstract: Large language models (LLMs) struggle with formal domains that require rigorous logical deduction and symbolic reasoning, such as mathematical proof generation. We propose a neuro-symbolic approach that combines LLMs' generative strengths with structured components to overcome this challenge. As a proof-of-concept, we focus on geometry problems. Our approach is two-fold: (1) We retrieve \emph{analogous problems} and use their proofs to guide the LLM, and (2) a \emph{formal verifier} evaluates the generated proofs and provides feedback, helping the model fix incorrect proofs.
We demonstrate that our method significantly improves proof accuracy for OpenAI’s o1 model (58%-70% improvement); both analogous problems and the verifier's feedback contribute to these gains. More broadly, shifting to LLMs that generate provably correct conclusions could dramatically improve their reliability, accuracy and consistency, unlocking complex tasks and critical real-world applications that require trustworthiness.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: LLM, neuro-symbolic approach, analogy retrieval, analogical guidance, verifier, formal verification, verification feedback, proof generation, math problems, geometry
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Previous URL: https://openreview.net/forum?id=21juuXpSst
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: Yes, I want a different area chair for our submission
Reassignment Request Reviewers: Yes, I want a different set of reviewers
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: 3
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: 3
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: 3
B4 Data Contains Personally Identifying Info Or Offensive Content: Yes
B4 Elaboration: Ethical Considerations
B5 Documentation Of Artifacts: Yes
B5 Elaboration: 3
B6 Statistics For Data: Yes
B6 Elaboration: 3
C Computational Experiments: No
C1 Model Size And Budget: N/A
C1 Elaboration: We used OpenAI's o1 model via API calls (no GPU, no computing infrastructure needed).
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: 4
C3 Descriptive Statistics: Yes
C3 Elaboration: 5
C4 Parameters For Packages: N/A
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: Ethical Considerations
Author Submission Checklist: yes
Submission Number: 723
Loading