REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning Datasets

15 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Evaluation, Refine, Vision-Language, Instruction Tuning.
Abstract: There is an emerging line of research on multimodal instruction tuning, and various benchmarks have been proposed for evaluating these models correspondingly. Instead of evaluating the models directly, in this paper we try to evaluate the Vision-Language Instruction-Tuning (VLIT) datasets themselves and even seek the way of building a dataset for developing an all-powerful VLIT model, which we believe could be fundamental for establishing a grounded protocol for benchmarking VLIT models. To achieve effective analysis of VLIT datasets, which remains an open question, we propose a tune-cross-evaluation paradigm: tuning on one dataset and evaluating on others in turn. For each tune-evaluation set, we define the Meta Quality (MQ) as the mean score measured by BLEU, METEOR, and ROUGE-L to quantify the quality of a dataset or a sample. On this basis, to evaluate the comprehensiveness of a dataset, we develop the Dataset Quality (DQ) covering all tune-evaluation sets. To lay the foundation for building a comprehensive dataset and developing an all-powerful model, we further create the Sample Quality (SQ) quantifying the all-sided quality of each sample. Extensive experiments validate the rationality of the proposed evaluation paradigm. According to the holistic evaluation, we build a new dataset, REVO-LION (REfining VisiOn-Language InstructiOn tuNing), by collecting samples with higher SQ from each dataset. With only half of the full data, the model trained on REVO-LION can achieve performance comparable to simply adding all VLIT datasets up. In addition to developing an all-powerful model, REVO-LION also includes an evaluation set, which is expected to serve as a convenient evaluation benchmark for future research.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 65
Loading