A Survey for Multimodal Mathematical Reasoning

CVPR 2023 Workshop NFVLR Submission4 Authors

Published: 12 Jun 2023, Last Modified: 13 Jun 2023NFVLR 2023 PosterEveryoneRevisions
Keywords: multimode, mathematical reasoning, prompts, instruction tuning
TL;DR: In this paper, we conducted a comprehensive survey into multimodal numerical or mathematical reasoning.
Abstract: Multimodal numerical reasoning, the ability to reason with and integrate information across multiple modalities, has become an increasingly important area of research in both natural language processing (NLP) and computer vision (CV) domains. Multimodal numerical reasoning is designed to extract information from multiple input modalities, such as text, image, etc., and merge them into a comprehensive conclusion. In this survey, we review and provide an overview of the recent advancements in multimodal numerical reasoning, including datasets,evaluation metrics and methods. In particular, we focus on the emerging capabilities of large language models (LLMs) in out-of-the-box tasks of arithmetic, common sense, and symbolic reasoning.While we conducted experiments on GPT-3.5 turbo's mathematical information extraction for a single modality limited by the openness of model functions.we also outline some of the remaining limitations and future research directions in this field, including the need for more comprehensive benchmarks and the development of models that can reason with more complex and diverse modalities.
Submission Number: 4
Loading