Forward-Backward Reasoning in Large Language Models for Mathematical Verification

Anonymous

Forward-Backward Reasoning in Large Language Models for Mathematical Verification

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Self-Consistency samples diverse reasoning chains with answers and chooses the final answer by majority voting. It is based on forward reasoning and cannot further improve performance by sampling more reasoning chains when saturated. To further boost performance, we introduce backward reasoning to verify candidate answers. Specifically, for mathematical tasks, we mask a number in the question and ask the LLM to answer a backward question created by a simple template, i.e., to predict the masked number when a candidate answer is provided. Instead of using forward or backward reasoning alone, we propose FOBAR to combine FOrward and BAckward Reasoning for verification. Extensive experiments on six standard mathematical data sets and three LLMs show that FOBAR achieves state-of-the-art performance. In particular, FOBAR outperforms Self-Consistency, which uses forward reasoning alone, demonstrating that combining forward and forward reasoning is better. In addition, FOBAR performs better than existing verification methods, showing the effectiveness of the simple template used in backward reasoning and the proposed combination. Extensions to non-mathematical problems are also discussed and validated empirically.

Paper Type: long

Research Area: Efficient/Low-Resource Methods for NLP

Contribution Types: Approaches to low-resource settings

Languages Studied: English

Preprint Status: There is a non-anonymous preprint (URL specified in the next question).

A1: yes

A1 Elaboration For Yes Or No: Section 6

A2: yes

A2 Elaboration For Yes Or No: Section 6

A3: yes

A3 Elaboration For Yes Or No: Abstract and Section 1

B: no

B1: yes

B1 Elaboration For Yes Or No: Section 4.1

B2: no

B2 Elaboration For Yes Or No: Licenses of data sets used are introduced in their original publications.

B3: yes

B3 Elaboration For Yes Or No: Section 4.1

B4: no

B4 Elaboration For Yes Or No: No names or uniquely identifies individual people or offensive content in the data set used as discussed in their original publications.

B5: no

B5 Elaboration For Yes Or No: Documentation of the data sets used in this work are provided in their original publications.

B6: yes

B6 Elaboration For Yes Or No: Appendix F

C: yes

C1: no

C1 Elaboration For Yes Or No: Experiments are based on OpenAI's LLMs (e.g., GPT-3.5-Turbo), whose number of parameters are not released.

C2: yes

C2 Elaboration For Yes Or No: Section 4.1

C3: yes

C3 Elaboration For Yes Or No: Section 4.1

C4: yes

C4 Elaboration For Yes Or No: Section 4.1

D: no

D1: no

D1 Elaboration For Yes Or No: This work does not use human annotators.

D2: no

D2 Elaboration For Yes Or No: This work does not use human annotators.

D3: no

D3 Elaboration For Yes Or No: This work does not use human annotators.

D4: no

D4 Elaboration For Yes Or No: This work does not use human annotators.

D5: no

D5 Elaboration For Yes Or No: This work does not use human annotators.

E: no

E1: no

E1 Elaboration For Yes Or No: This work does not use AI assistants.

0 Replies

Loading