Keywords: Multimodal LLM, Reasoning, Visual Context, Benchmark
TL;DR: LogicVista is a benchmark evaluating MLLMs' logical reasoning in visual contexts, using 448 annotated multiple-choice questions across 5 tasks and 9 capabilities, with 8 MLLMs comprehensively tested.
Abstract: We propose LogicVista , an evaluation benchmark that examines multimodal large language models’ (MLLMs) integrated Logical reasoning capacities in Visual contexts. Recent advancements in MLLMs have demonstrated various fascinating abilities such as crafting poetry based on an image to engaging in mathematical reasoning. Despite these feats, there remains a gap in the systematic examination of MLLMs’ proficiency in logical reasoning tasks. These skills are routinely invoked in navigation, puzzle-solving, etc. Thus we present LogicVista, which evaluates general logical cognition abilities across a spectrum of 5 logical reasoning tasks with 3 broad capabilities and 11 specific capabilities through a sample of 448 multiple-choice questions. Each is annotated with not only the correct answer but also the human written reasoning behind the selection, allowing for rich open- ended evaluation as well as MCQ evaluation. A total of 11 MLLMs undergo comprehensive evaluation using LogicVista. We are also introducing a crowdsourced annotation tool to further scale LogicVista with support from the community. Code and Data Available at https://anonymous.4open.science/r/LogicVista.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13059
Loading