From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment

ACL ARR 2024 June Submission1053 Authors

14 Jun 2024 (modified: 06 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) have enhanced the capacity of vision-language models to caption visual text. This generative approach to image caption enrichment further makes textual captions more descriptive, improving alignment with the visual context. However, while many studies focus on benefits of generative caption enrichment (GCE), are there any negative side effects? We compare standard-format captions and recent GCE processes from the perspectives of gender bias and hallucination, showing that enriched captions suffer from increased gender bias and hallucination. Furthermore, models trained on these enriched captions amplify gender bias by an average of 30.9% and increase hallucination by 59.5%. This study serves as a caution against the trend of making captions more descriptive.
Paper Type: Short
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: Fairness, Generative Caption Enrichment, Gender bias, Hallucination
Contribution Types: Data analysis
Languages Studied: Engliish
Submission Number: 1053
Loading