Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Abstract: Recently, there has been significant progress in teaching language models to perform step-by-step reasoning to solve complex numerical reasoning tasks. Chain-of-thoughts prompting (CoT) is the state-of-art method for many of these tasks. CoT uses language models to produce text describing reasoning, and computation, and finally the answer to a question. Here we propose `Program of Thoughts' (PoT), which uses language models (mainly Codex) to generate text and programming language statements, and finally an answer. In PoT, the computation can be delegated to a program interpreter, which is used to execute the generated program, thus decoupling complex computation from reasoning and language understanding. We evaluate PoT on five math word problem datasets and three financial-QA datasets in both few-shot and zero-shot settings. We find that PoT has an average performance gain over CoT of around 12% across all datasets. By combining PoT with self-consistency decoding, we can achieve extremely strong performance on all the math datasets and financial datasets. All of our data and code will be released.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. Adding ChatGPT and CodeGen, CodeT5+, Xgen results 2. Fix the citation issue 3. Adding comparison with PaL 4. Adding paragraph in related work to compare with contemporary work 5. Fixing other minor issues in writing 6. Adding figures and description about "CoT as intermediate step" 7. Adding explanation about "calculator" baseline 8. Adding GPT-4 results 9. Adding prompt exemplars in the appendix
Assigned Action Editor: ~Karthik_R_Narasimhan1
Submission Number: 1297