Abstract: Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs) are plagued by the critical issue of hallucina-
tion. The reliable detection of such hallucinations in MLLMs has, therefore, become a vital aspect of model evaluation and the safeguard-
ing of practical application deployment. Prior research in this domain has been constrained by a narrow focus on singular tasks, an inadequate range of hallucination categories addressed, and a lack of detailed granularity. In response to these challenges, our work expands the investigative horizons of hallucination detection. We present a novel meta-evaluation benchmark, MHaluBench, meticulously crafted to facilitate the evaluation of advancements in hallucination detection methods. Additionally, we unveil a novel unified multimodal hallucination detection framework, UNIHD, which leverages a suite of auxiliary tools to validate the occurrence of hallucinations robustly. We demonstrate the effectiveness of UNIHD through meticulous evaluation and comprehensive analysis. We also provide strategic insights on the application of specific tools for addressing various categories of hallucinations.
Paper Type: long
Research Area: Resources and Evaluation
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Preprint Status: There is a non-anonymous preprint (URL specified in the next question).
A1: yes
A1 Elaboration For Yes Or No: Limitation Section
A2: yes
A2 Elaboration For Yes Or No: Limitation Section
A3: yes
A3 Elaboration For Yes Or No: Section 1
B: no
B1: yes
B1 Elaboration For Yes Or No: Section Reference
B2: no
B2 Elaboration For Yes Or No: No license involved
B3: n/a
B3 Elaboration For Yes Or No: n/a
B4: no
B4 Elaboration For Yes Or No: No such information involved
B5: n/a
B5 Elaboration For Yes Or No: n/a
B6: yes
B6 Elaboration For Yes Or No: Section 3
C: yes
C1: no
C1 Elaboration For Yes Or No: We are using a large closed-source model and do not know the parameters.
C2: yes
C2 Elaboration For Yes Or No: Section 5.1
C3: yes
C3 Elaboration For Yes Or No: Section 5.2
C4: n/a
C4 Elaboration For Yes Or No: Not involved
D: yes
D1: yes
D1 Elaboration For Yes Or No: Appendix
D2: n/a
D2 Elaboration For Yes Or No: n/a
D3: yes
D3 Elaboration For Yes Or No: Secition 3
D4: n/a
D4 Elaboration For Yes Or No: No ethics invloved
D5: no
D5 Elaboration For Yes Or No: The amount of labeled data is not large and fewer annotators are involved.
E: yes
E1: yes
E1 Elaboration For Yes Or No: Section 3, we conduct experiments based on ChatGPT and GPT-4V.
0 Replies
Loading