UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images, and Videos

UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images, and Videos

ACL ARR 2026 January Submission1459 Authors

30 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: financial domain, benchmark, multimodal large language model, multimodal QA

Abstract: Multimodal large language models are playing an increasingly significant role in empowering the financial domain, however, the challenges they face, such as multimodal and high-density information and cross-modal multi-hop reasoning, go beyond the evaluation scope of existing multimodal benchmarks. To address this gap, we propose UniFinEval, the first unified multimodal benchmark designed for high-information-density financial environments, covering text, images, and videos. UniFinEval systematically constructs five core financial scenarios grounded in real-world financial systems: Financial Statement Auditing, Company Fundamental Reasoning, Industry Trend Insights, Financial Risk Sensing, and Asset Allocation Analysis. We manually construct a high-quality dataset consisting of 3,767 question-answer pairs in both chinese and english and systematically evaluate 10 mainstream MLLMs under Zero-Shot and CoT settings. Results show that Gemini-3-pro-preview achieves the best overall performance, yet still exhibits a substantial gap compared to financial experts. Further error analysis reveals systematic deficiencies in current models. UniFinEval aims to provide a systematic assessment of MLLMs’ capabilities in fine-grained, high–information-density financial environments, thereby enhancing the robustness of MLLMs applications in real-world financial scenarios. Data and code are available at https://anonymous.4open.science/r/anonym4B75.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: finance, multimodal large language model, multimodal QA, knowledge base QA, logical reasoning QA, open-domain QA

Contribution Types: Data resources, Data analysis

Languages Studied: Chinese, English

Submission Number: 1459

Loading