FIHA: Fine-grained Hallucinations Evaluations in Large Vision Language Models

ACL ARR 2024 June Submission323 Authors

10 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid development of Large Vision Language Models (LVLMs) often comes with widespread hallucination issues, making cost-effective and comprehensive assessments increasingly vital. Therefore, we introduce the FIHA (Fine-graIned Hallucination evAluation), a multidimensional hallucination evaluation method for LVLMs that is LLM-free and annotation-free. FIHA can generate QA pairs on any image dataset at minimal cost, enabling hallucination assessment from both image and caption. Based on this approach, we introduce a benchmark (FIFA-v1) consisting of diverse questions on various images from MS COCO and Foggy Cityscapes. Furthermore, we use the Davidson Scene Graph (DSG) to organize the structure among QA pairs, in which we can increase reliability of the evaluation. We evaluate representative models using FIHA-v1, highlighting their limitations and challenges. Our code and data can be found here: https://anonymous.4open.science/r/FIHA-45BB
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking, automatic evaluation of datasets, evaluation methodologies, reproducibility
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 323
Loading