Evaluating Cultural and Linguistic Alignment Across the LLMs

Yunxi Liu; Fuxiao Liu; Clara Fangfang Ma

Evaluating Cultural and Linguistic Alignment Across the LLMs

Yunxi Liu, Fuxiao Liu, Clara Fangfang Ma

Published: 24 Sept 2025, Last Modified: 27 Sept 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Evaluation, Culture

Abstract: Large language models (LLMs) evolve not only in scale and benchmark performance but also in how they mediate human communication. We evaluate GPT-4, Claude, DeepSeek, and Qwen on culturally sensitive scenarios involving identity, language, and facework, treating cultural adaptation as an emergent ability of the LLM lifecycle. Using controlled prompts and interpreting results through Hofstede’s and GLOBE frameworks, we find systematic divergences: Western models emphasize individualism and directness, while Chinese models adopt collectivist, high-context strategies. Moreover, GPT-4 shifts style when prompted in Chinese, revealing that cultural alignment is dynamic rather than fixed. These findings extend LLM evaluation beyond accuracy to the lifecycle of cross-cultural behavior, underscoring the need for culturally aware scaling and inclusive benchmarks.

Submission Number: 146

Loading