MathSensei: Mathematical Reasoning with a Tool-Augmented Large Language Model

ICLR 2024 Workshop ME-FoMo Submission89 Authors

Published: 04 Mar 2024, Last Modified: 06 May 2024ME-FoMo 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Tool-Augmented LLM, TALM, reasoning, mathematical reasoning, MathSensei
Abstract: Large Language Models, augmented with specialized tools or modules, commonly referred to as TALMs, show superior reasoning abilities over generic LLMs, across different knowledge intensive Question Answering (QA) tasks. However, their efficacy on complex mathematical reasoning benchmarks, has remained largely unexplored. Moreover, existing research lacks the study of complementary benefits offered by diverse tool-sets towards solving mathematical problems. In this work, we present a TALM-based framework - MATHSENSEI, which is powered by a knowledge retriever (LLM or Bing Web Search), program generator + executor (Python), and symbolic problem solver (Wolfram-Alpha). We perform extensive ablations with various tool combinations, across multiple math sub-disciplines of different datasets. Our experiments also comprise evaluation of well-known planning algorithms such as REACT and Plan-And-Solve. MATHSENSEI outperforms gpt-3.5-turbo with chain-of-thought (CoT) by 13.5 % on the MATH dataset. We observe that TALMs are beneficial for progressively increasing complexity of problems (such as AQuA, MMLU-Math, and higher level complex questions in MATH), and show minimal benefits over simpler math word problems (such as GSM-8k). The code and data are available at https://github.com/Debrup-61/MathSensei.
Submission Number: 89
Loading