Toward Data-driven Skill Identification for General-purpose Vision-language Models

Published: 04 Mar 2024, Last Modified: 02 May 2024DPFM 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: data resources, data analysis, vision-language models, factor analysis
TL;DR: Our study employs factor analysis to identify hidden skills responsible for the performance of vision-language models.
Abstract: The evolution of vision-language (VL) models towards broad competencies has complicated benchmarking, necessitating diverse tasks for accurate evaluation. Moving beyond intuition-guided task selection common in existing benchmarks, we propose a data-driven approach that leverages transfer performance and Factor Analysis (FA) to identify latent skills crucial for VL tasks. Our study demonstrates the utility of FA in systematically understanding and evaluating VL models.
Submission Number: 42
Loading