Keywords: LLM, Tool-Augmented LLM, TALM, reasoning, mathematical reasoning, MathSensei
Abstract: Large Language Models, augmented with specialized tools or modules, commonly referred to as TALMs, show superior reasoning abilities over generic LLMs, across different knowledge intensive Question Answering (QA) tasks. However, their efficacy on complex mathematical reasoning benchmarks, has remained largely unexplored. Moreover, existing research lacks the study of complementary benefits
offered by diverse tool-sets towards solving mathematical problems. In this work, we present a TALM-based framework - MATHSENSEI, which is powered by a knowledge retriever (LLM or Bing Web Search), program generator + executor (Python), and symbolic problem solver (Wolfram-Alpha). We perform extensive ablations with various tool combinations, across multiple math sub-disciplines of
different datasets. Our experiments also comprise evaluation of well-known planning algorithms such as REACT and Plan-And-Solve. MATHSENSEI outperforms gpt-3.5-turbo with chain-of-thought (CoT) by 13.5 % on the MATH dataset. We observe that TALMs are beneficial for progressively increasing complexity of problems (such as AQuA, MMLU-Math, and higher level complex questions in MATH), and show minimal benefits over simpler math word problems (such as GSM-8k). The code and data are available at https://github.com/Debrup-61/MathSensei.
Submission Number: 89
Loading