Community-Centered Measurement of Cultural Content in AI Images

Published: 29 Apr 2026, Last Modified: 29 Apr 2026Eval Eval @ ACL 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: measurement, evaluation, cultural evaluation, LLM-as-a-judge, rubrics, multimodal evaluation, text-to-image generation
TL;DR: We propose a new methodology to design LLM-as-a-judge evaluation rubrics that capture the lived expertise of community members.
Abstract: Recent efforts to automate and quantify generative AI evaluation can be in tension with the ability for measurement instruments to capture the expertise and perspectives of communities impacted by AI. In this paper, we explore how to involve communities in drafting evaluation rubrics that can be used to score AI images of cultural content. Specifically, we systematize the concept of "culturally appropriate" depictions of cultural content (i.e., culturally significant objects) through case studies with three communities: blind and low vision individuals residing in the UK and residents of two distinct Indian geographic states. Our systematized concepts reflect community members' lived experiences and desires for cultural representation, demonstrating the value of community involvement in defining valid measures. We explore how these systematized concepts can be operationalized into automated measurement instruments that could be applied across contexts using a multimodal LLM-as-a-judge approach. We point to timely opportunities for research to advance methods that bring community expertise into the sociotechnical measurement of generative AI systems.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Type: Research Paper
Archival Status: Non-archival
Submission Number: 72
Loading