MM-InstructEval: Zero-shot evaluation of (Multimodal) Large Language Models on multimodal reasoning tasks
Abstract: Highlights•We propose MM-InstructEval with metrics for efficacy, robustness, and adaptability.•We evaluate 45 models on 16 datasets with 10 instructions for 6 multimodal tasks.•We assess LLMs and MLLMs, revealing insights into performance on multimodal tasks.
Loading