FIHA: Fine-grained Hallucinations Evaluations in Large Vision Language Models

Abstract: The rapid development of Large Vision Language Models (LVLMs) often comes with widespread hallucination issues, making cost-effective and comprehensive assessments increasingly vital. Therefore, we introduce the FIHA (Fine-graIned Hallucination evAluation), a multidimensional hallucination evaluation method for LVLMs that is LLM-free and annotation-free. FIHA can generate QA pairs on any image dataset at minimal cost, enabling hallucination assessment from both image and caption. Based on this approach, we introduce a benchmark (FIFA-v1) consisting of diverse questions on various images from MS COCO and Foggy Cityscapes. Furthermore, we use the Davidson Scene Graph (DSG) to organize the structure among QA pairs, in which we can increase reliability of the evaluation. We evaluate representative models using FIHA-v1, highlighting their limitations and challenges. Our code and data can be found here:
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking, automatic evaluation of datasets, evaluation methodologies, reproducibility
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 323