Abstract: Self-Consistency samples diverse reasoning chains with answers and chooses the final answer by majority voting. It is based on forward reasoning and cannot further improve performance by sampling more reasoning chains when saturated. To further boost performance, we introduce backward reasoning to verify candidate answers. Specifically, for mathematical tasks, we mask a number in the question and ask the LLM to answer a backward question created by a simple template, i.e., to predict the masked number when a candidate answer is provided. Instead of using forward or backward reasoning alone, we propose FOBAR to combine FOrward and BAckward Reasoning for verification. Extensive experiments on six standard mathematical data sets and three LLMs show that FOBAR achieves state-of-the-art performance. In particular, FOBAR outperforms Self-Consistency, which uses forward reasoning alone, demonstrating that combining forward and forward reasoning is better. In addition, FOBAR performs better than existing verification methods, showing the effectiveness of the simple template used in backward reasoning and the proposed combination. Extensions to non-mathematical problems are also discussed and validated empirically.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: Approaches to low-resource settings
Languages Studied: English
Preprint Status: There is a non-anonymous preprint (URL specified in the next question).
A1: yes
A1 Elaboration For Yes Or No: Section 6
A2: yes
A2 Elaboration For Yes Or No: Section 6
A3: yes
A3 Elaboration For Yes Or No: Abstract and Section 1
B: no
B1: yes
B1 Elaboration For Yes Or No: Section 4.1
B2: no
B2 Elaboration For Yes Or No: Licenses of data sets used are introduced in their original publications.
B3: yes
B3 Elaboration For Yes Or No: Section 4.1
B4: no
B4 Elaboration For Yes Or No: No names or uniquely identifies individual people or offensive content in the data set used as discussed in their original publications.
B5: no
B5 Elaboration For Yes Or No: Documentation of the data sets used in this work are provided in their original publications.
B6: yes
B6 Elaboration For Yes Or No: Appendix F
C: yes
C1: no
C1 Elaboration For Yes Or No: Experiments are based on OpenAI's LLMs (e.g., GPT-3.5-Turbo), whose number of parameters are not released.
C2: yes
C2 Elaboration For Yes Or No: Section 4.1
C3: yes
C3 Elaboration For Yes Or No: Section 4.1
C4: yes
C4 Elaboration For Yes Or No: Section 4.1
D: no
D1: no
D1 Elaboration For Yes Or No: This work does not use human annotators.
D2: no
D2 Elaboration For Yes Or No: This work does not use human annotators.
D3: no
D3 Elaboration For Yes Or No: This work does not use human annotators.
D4: no
D4 Elaboration For Yes Or No: This work does not use human annotators.
D5: no
D5 Elaboration For Yes Or No: This work does not use human annotators.
E: no
E1: no
E1 Elaboration For Yes Or No: This work does not use AI assistants.
0 Replies
Loading