FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning

27 May 2024 (modified: 13 Nov 2024)Submitted to NeurIPS 2024 Track Datasets and BenchmarksEveryoneRevisionsBibTeXCC BY 4.0
Keywords: formula, numerical reasoning, question answering
Abstract: The application of formulas is a fundamental ability of humans when addressing numerical reasoning problems. However, existing numerical reasoning datasets seldom explicitly indicate the formulas employed during the reasoning steps. To bridge this gap, we propose a dataset for formula-based numerical reasoning called FormulaReasoning, which consists of 5,420 questions. We further conduct evaluations on LLMs with size ranging from 7B to over 100B parameters utilizing zero-shot and few-shot chain-of-thoughts methods and we explored the approach of using retrieval-augmented LLMs when providing an external formula database. We divide the reasoning process into formula generation, parameter extraction, and calculation, and use the data augmentation method to enhance the model ability of the model with parameters count less than 7B. Our empirical findings underscore the significant potential for improvement in existing models when applied to our complex, formula-driven FormulaReasoning.
Supplementary Material: pdf
Submission Number: 956
Loading