MAEV: Multimodal Automatic Evaluation Framework

Anonymous

MAEV: Multimodal Automatic Evaluation Framework

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

Abstract: We propose a novel framework MAEV, designed to evaluate the performance of multimodal large language models (MM-LLMs) on complex, open-ended multimodal reasoning tasks. Our approach mitigates the limitations in the conventional classification-based MM-LLM evaluation methods, providing a comprehensive analysis of free-form MM-LLM responses by leveraging state-of-the-art LLMs. To achieve this, we introduce two carefully crafted evaluation datasets comprising 2K ground-truth long-form responses to open-ended visual queries and detailed image descriptions. Our experimental results demonstrate the effectiveness of MAEV, as it closely aligns with human evaluation outcomes and offers a much-needed solution to complement the time-consuming manual assessment process. This framework has the potential to accelerate the development of cutting-edge MM-LLMs.

Paper Type: short

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

0 Replies

Loading