Unsolvable Problem Detection: Evaluating Trustworthiness of Large Multimodal Models

Atsuyuki Miyai; Jingkang Yang; Jingyang Zhang; Yifei Ming; Qing Yu; Go Irie; Yixuan Li; Hai Li; Ziwei Liu; Kiyoharu Aizawa

Unsolvable Problem Detection: Evaluating Trustworthiness of Large Multimodal Models

Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa

13 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Unsolvable Problem Detection: Evaluating Trustworthiness of Large Multimodal Models

TL;DR: Large Multimodal Models; Benchmark ; Trustworthy AI

Abstract: This paper introduces a novel and well-defined challenge for Large Multimodal Models (LMMs), termed Unsolvable Problem Detection (UPD). UPD examines the LMM's ability to withhold answers when faced with unsolvable problems. UPD encompasses three problems: Absent Answer Detection (AAD), Incompatible Answer Set Detection (IASD), and Incompatible Visual Question Detection (IVQD), covering unsolvable cases like answer-lacking or incompatible choices and image-question mismatches. In this paper, we introduce the MM-UPD Bench, a benchmark for assessing performance across various ability dimensions. Our experiments reveal that even most LMMs, which demonstrate adequate performance on existing benchmarks, struggle significantly with MM-UPD, underscoring a novel aspect of trustworthiness that current benchmarks have overlooked. To deepen the understanding of the UPD, we explore various solutions, including chain of thought, self-reflection, and instruction tuning, and demonstrate each approach's efficacy and limitations. We hope our insights, together with future efforts within the proposed UPD settings, will enhance the broader understanding and development of more practical and reliable LMMs.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 51

Loading