MAEV: Multimodal Automatic Evaluation FrameworkDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: We propose a novel framework MAEV, designed to evaluate the performance of multimodal large language models (MM-LLMs) on complex, open-ended multimodal reasoning tasks. Our approach mitigates the limitations in the conventional classification-based MM-LLM evaluation methods, providing a comprehensive analysis of free-form MM-LLM responses by leveraging state-of-the-art LLMs. To achieve this, we introduce two carefully crafted evaluation datasets comprising 2K ground-truth long-form responses to open-ended visual queries and detailed image descriptions. Our experimental results demonstrate the effectiveness of MAEV, as it closely aligns with human evaluation outcomes and offers a much-needed solution to complement the time-consuming manual assessment process. This framework has the potential to accelerate the development of cutting-edge MM-LLMs.
Paper Type: short
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
0 Replies

Loading