Robustness of Cultural Norm Reasoning Under Language and Context Perturbations
Keywords: Cultural Norm Reasoning, Robustness, Multilingual Language Models, Context Perturbations
TL;DR: We test cultural robustness by checking whether multilingual LLMs change their norm judgments for the right reasons: adapting to relevant cultural context while staying stable under surface-level or irrelevant context changes.
Abstract: Real-world social judgments often depend on local norms, values, and situational expectations. We study whether multilingual language models use such cultural context robustly, or whether their judgments are overly sensitive to surface cues such as prompt language, country labels, and externally supplied contextual descriptions. Building on NormAd, a benchmark for cultural norm adherence, we keep the underlying scenario fixed while perturbing language, context granularity, and contextual consistency. We evaluate 12 models from four model families: Cohere Command, Cohere Tiny-Aya, Alibaba Qwen, and Google Gemma. Across three countries, five language conditions, and 55,000 raw generations, models benefit most from explicit norm descriptions. Adding these descriptions improves accuracy by 17.8 percentage points over country-only prompts. In contrast, adding externally supplied rules that are not applicable to the scenario is the largest harmful perturbation, reducing accuracy by 24.5 points. Language changes produce 9–14% prediction flips, but these flips are directionally balanced. These results show why flip rate alone is insufficient: it does not distinguish between correct-to-wrong (C→W) and wrong-to-correct (W→C) transitions. We therefore report cultural robustness using signed accuracy change, C→W/W→C decompositions, and perturbation-specific vulnerability alongside clean accuracy.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 52
Loading