VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

ACL ARR 2024 December Submission1687 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Domain generalizability is a crucial aspect of a deep learning model since it determines the capability of the model to perform well on data from unseen domains. However, research on the domain generalizability of deep learning models for vision-language tasks remains limited, primarily because of the lack of required datasets. To address these challenges, we propose **VolDoGer**: Vision-Language Dataset for Domain Generalization, a dedicated dataset designed for domain generalization that addresses three vision-language tasks: image captioning, visual question answering, and visual entailment. We constructed **VolDoGer** by extending LLM-based data annotation techniques to vision-language tasks, thereby alleviating the burden of recruiting human annotators. We evaluated the domain generalizability of various models through **VolDoGer**.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: automatic creation and evaluation of language resources, NLP datasets
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 1687
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview