CultureShift: Mapping Temporal Cultural Evolution in Vision-Language Models

Published: 06 May 2025, Last Modified: 29 May 2025VLMs4All 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: cultural shift, evaluation, benchmark, dataset, temporal cultural evolution
Abstract: As vision-language models (VLMs) become embedded in global technologies, ensuring that they are culturally aware is critical for fairness, representation, and societal relevance. Yet, current benchmarks for evaluating cultural competence treat culture as static, overlooking the fact that cultural norms, aesthetics, and values evolve over time. In this work, we introduce the concept of Temporal Cultural Awareness— the capacity of AI models to recognize and adapt to shifting cultural representations across decades. To operationalize this concept, we present a novel evaluation framework grounded in cinema, leveraging film media as a time-aligned, globally resonant proxy for cultural evolution. We curate CineCulture, a dataset of annotated movie screenshots from Hollywood and Bollywood films spanning multiple decades, capturing fine-grained, visually evident cultural attributes across themes like clothing, architecture, gender roles, and leisure. This dataset enables systematic assessment of how well VLMs reflect evolving cultural signals, both geographically and temporally. Our contributions include a new benchmark task, proposed evaluation metrics, and an empirical analysis revealing that popular VLMs often fail to track temporal cultural shifts. Our work calls for a new dimension of evaluation in culturally competent AI: not only geographic inclusivity but temporal inclusivity as well.
Submission Number: 22
Loading