From Guanyin, UFOs to Paradise: Capturing Cultural Variation in Dream Interpretation

From Guanyin, UFOs to Paradise: Capturing Cultural Variation in Dream Interpretation

ICLR 2026 Conference Submission24929 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: bilingual dream interpretation, cross-cultural alignment

Abstract: Humans have long sought to uncover the mystery of dreams from divine signs for predicting fortune and future, to psychology framing them as reflections of the subconscious. This curiosity extends to large language models (LLMs), where commercial LLMs e.g., OpenAI and DeepSeek exhibit preliminary dream interpretation abilities. However, open-source research remains limited to monolingual, western-centric datasets, with evaluations largely confined to classification tasks. We address these gaps by introducing a bilingual dataset of 31,877 unique dream–interpretation pairs spanning three cultural contexts: China, the Middle East and the West, in English and Arabic. Analysis shows $<$18\% of dream symbols overlap across cultures. Chinese symbols emphasize scenario-based activities and figures like *Guanyin*, Arabic symbols reference religion and concepts such as *paradise* and *fasting*, while English symbols draw on technology like *UFOs* and fictional creatures. We evaluated 17 models and found that new state-of-the-art models integrating general-purpose and reasoning modes into one model perform best in reasoning mode, whereas earlier models separating chat and reasoning favor chat settings. While language is not a bottleneck for SOTA models, capturing cultural nuances of under-represented regions e.g., the Middle East remains challenging. Further fine-tuning of six LLMs shows that LoRA benefits larger models, while full-parameter is better for smaller ones. Although SFT equips models with cultural knowledge, post-training knowledge is less stable than pre-training, exhibiting sensitivity to training settings. Data and code are available at `http://URL.withheld.for.review`.

Primary Area: datasets and benchmarks

Submission Number: 24929

Loading