CHAMP: A Competition-Level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities
Keywords: Math Dataset, LLM Math Reasoning, LLM Evaluation
TL;DR: We curate a new dataset of challenging competition-level math problems and conduct a fine-grained analysis of math reasoning abilities of LLMs on it.
Abstract: Recent large language models (LLMs) have shown indications of math deduction abilities. However, it is unclear that for challenging math problems, what information about the problem helps (or hurts). In this paper, we propose a challenging benchmark dataset for such analyses. The Concept and Hint-Annotated Math Problems, or CHAMP, consists of competition-level math problems annotated with "concepts," or general math facts, and "hints," or problem-specific tricks. These entities and their interconnections allow us to explore the effects of additional information, such as relevant hints, misleading concepts, or related problems. We conduct 12 preliminary studies with 4 models, summarize our findings and discuss how CHAMP supports general discussions around LLMs' capabilities to understand and use contexts. The dataset, code and an extended version of the paper are available on the project website at https://yujunmao1.github.io/CHAMP.
Submission Number: 14
Loading