NormLens: Massively Multicultural MLLM Reasoning with Fine-Grained Social Awareness

Yi R. Fung; Heng Ji

NormLens: Massively Multicultural MLLM Reasoning with Fine-Grained Social Awareness

Yi R. Fung, Heng Ji

Published: 24 Jul 2025, Last Modified: 01 Aug 2025Social Sim'25EveryoneRevisionsBibTeXCC BY 4.0

Keywords: sociocultural-awareness; dialogue generation

Abstract: Multimodal large language models (MLLMs) have revolutionized many applications but still face challenges related to cultural bias and a lack of cultural commonsense knowledge crucial for guiding cross-culture communication and interactions. In particular, prior studies in the cultural domain largely overlook the fine-grained situational context reflecting the diverse and rich cultures across the world. To bridge this gap, we introduce a novel approach for massively multicultural MLLM knowledge acquisition at the fine-grained social awareness level. First, we construct a novel dataset, NormLens, for benchmarking sociocultural norm-aware reasoning in the underlying LLM backbones, by extracting and curating 42,000 culturally grounded assertions from Wikipedia, spanning 1,000+ sub-country regions and 2,000+ ethnolinguistic groups, with automated cleaning for self-contained sentences and fine-grained cultural profile extraction. Building on this, we propose a novel framework for multimodal cultural knowledge acquisition, MM-ACE (Multi-Modal Alignment with Cultural Enhancement), via scalable finetuning on contrastive (norm, dialogue, image) triplets. Experiments demonstrate that MM-ACE improves cultural norm violation detection by 7.5% F-score over baselines, with particularly strong gains on fine-grained situational understanding tasks in our manually curated gold standard test set.

Submission Number: 23

Loading