Track: Technical
Keywords: historical, image, dataset, evaluation, multi-modal
TL;DR: The paper introduces a dataset of sensitive historical images for use in evaluations.
Abstract: How do we measure the way multi-modal generative models, like GPT-4 and
Gemini, describe images of historical events and figures, whose legacies may be
nuanced, multifaceted, or contested? As a first step to addressing this challenge,
we introduce Century – a novel dataset of sensitive historical images. This dataset
consists of 1,500 images from recent history, created through a novel automated
method combining knowledge graphs and language models, while being rooted in
the practices of museums and digital archives. We demonstrate through automated
and human evaluation that this method produces a set of images that depict events
and figures that are diverse across topics and represents all regions of the world,
with implications for the development of evaluations for historical contextualisation
and socio-cultural understanding.
Submission Number: 61
Loading