Culturally-Aware Conversations: A Framework & Benchmark for LLMs

Published: 24 Sept 2025, Last Modified: 24 Sept 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: holistic evaluation, benchmarks, evaluation, culture, conversations
Abstract: As LLMs grow and evolve, they are deployed in diverse contexts and cultures worldwide. However, existing benchmarks for cultural adaptation in LLMs are misaligned with the actual challenges these models face when interacting with users from diverse cultural backgrounds. In this work, we introduce the first framework and benchmark designed to evaluate LLMs in realistic, multicultural conversational settings. Grounded in sociocultural theory, our framework formalizes how linguistic style -- a key element of cultural communication -- is shaped by situational, relational, and cultural contexts. We construct a benchmark dataset based on this framework, annotated by culturally diverse raters, and propose a new set of desiderata for cross-cultural evaluation in NLP: conversational framing, stylistic sensitivity, and subjective correctness.
Submission Number: 148
Loading