When East Asia Loses Its Names: Interpreting Neighborhood Effect and Cultural Generalization in Vision-Language Models
Keywords: Vision-Language Models, Cultural Generalization, Neighborhood Effect, Cultural Recognition, Mutation Layers
TL;DR: LVLMs misrecognize East Asian cultural artifacts through Neighborhood Effect and Cultural Generalization
Abstract: Large vision-language models (LVLMs) perform well across multimodal tasks but still struggle with fine-grained cultural recognition among cul- turally similar countries. In this paper, we analyze the Neighborhood Effect, a cultural misidentifi- cation pattern in which artifacts from relatively un- derrepresented countries are interpreted as belong- ing to culturally similar but more dominant neigh- boring countries. We collect images of cultural ar- tifacts centered on centered on China, Japan, and Korea, and analyze the generated captions at both the word and token-probability levels. Word-level analysis shows that LVLMs often omit national- level cultural identities and shift underrepresented artifacts toward dominant neighboring cultures or broader labels such as “Asian,” a pattern we call Cultural Generalization. Token-probability analysis suggests that direct misidentification as a specific neighboring country is tied to early Vision Encoder or multimodal representation er- rors, whereas Cultural Generalization is more fre- quently associated with Mutation Layers in the later LLM stage under cultural uncertainty. These findings clarify how fine-grained cultural recog- nition failures in LVLMs emerge and take shape among culturally similar countries.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 9
Loading