UniMath: A Foundational and Multimodal Mathematical Reasoner

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX
Submission Type: Regular Short Paper
Submission Track: NLP Applications
Submission Track 2: Question Answering
Keywords: Multimodal Math Reasoning, Math Word Problem Solving, Geometry Problem Solving
Abstract: While significant progress has been made in natural language processing (NLP), existing methods exhibit limitations in effectively interpreting and processing diverse mathematical modalities. Therefore, we introduce UniMath, a versatile and unified system designed for multimodal mathematical reasoning tasks. Tackling complex problem-solving in arithmetic, geometry, and table-based math, UniMath utilizes a fine-tuned T5 model augmented with a variational autoencoder (VAE)-based image tokenizer. By jointly training and evaluating the model on three diverse datasets - SVAMP, GeoQA, and TableMWP, UniMath achieves state-of-the-art performance. The model's generalization ability is further demonstrated via fine-tuning on two additional datasets, MathQA and Geo-Proving. Through comprehensive evaluations, we showcase that joint training across diverse math tasks improves overall model performance and enhances its ability to generalize across different mathematical reasoning tasks. This pioneering approach provides a blueprint and inspires further efforts on unified mathematical reasoning with deep learning systems.
Submission Number: 1999
Loading