Toward Data-driven Skill Identification for General-purpose Vision-language Models

Published: 04 Mar 2024, Last Modified: 02 May 2024DPFM 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: data resources, data analysis, vision-language models, factor analysis
TL;DR: Our study employs factor analysis to identify hidden skills responsible for the performance of vision-language models.
Abstract: The evolution of vision-language (VL) models towards broad competencies has complicated benchmarking, necessitating diverse tasks for accurate evaluation. Moving beyond intuition-guided task selection common in existing benchmarks, we propose a data-driven approach that leverages transfer performance and Factor Analysis (FA) to identify latent skills crucial for VL tasks. Our study demonstrates the utility of FA in systematically understanding and evaluating VL models.
Submission Number: 42
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview