MaCBench: A multimodal chemistry and materials science benchmark

MaCBench: A multimodal chemistry and materials science benchmark

NeurIPS 2024 Workshop AI4Mat Submission50 Authors

Published: 08 Oct 2024, Last Modified: 04 Nov 2024AI4Mat-NeurIPS-2024 SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Submission Track: LLMs for Materials Science - Full Paper

Submission Category: All of the above

Keywords: LLM, Benchmark, MLM, Multimodal

TL;DR: We present a dataset for benchmarking multimodal language models.

Abstract: We present MaCBench, a multimodal benchmark for evaluating AI models in chemistry and materials science tasks. This benchmark addresses the lack of comprehensive, domain-specific evaluation tools for multimodal AI in scientific contexts. MaCBench encompasses tasks across three key areas: fundamental scientific understanding, data extraction from visual information, and practical laboratory knowledge, totaling 628 questions. It includes diverse visual inputs such as laboratory images, band structures, crystal structures, and atomic force microscopy images paired with multiple-choice questions. We evaluate state-of-the-art multimodal AI models (GPT4-o, Claude-3.5-Sonnet, Gemini-1.5-Pro) on MaCBench, revealing significant performance variations across tasks and skills. While models excel at basic pattern recognition and information retrieval, they struggle with complex reasoning and applying scientific principles to novel situations. Notably, we observe a disconnect between object recognition and contextual understanding in laboratory safety scenarios. MaCBench provides crucial insights into the capabilities and limitations of multimodal AI in chemistry and materials science, serving as a valuable tool for guiding the development of more capable AI systems for scientific research.

AI4Mat Journal Track: Yes

Submission Number: 50

Loading