From Guanyin, UFOs to Paradise: Capturing Cultural Variation in Dream Interpretation

ACL ARR 2026 January Submission3710 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: bilingual dream interpretation, cross-cultural alignment
Abstract: Humans have long explored dreams, from predicting fortune and future to reflecting the subconscious. This curiosity now extends to large language models (LLMs). Commercial LLMs exhibit preliminary dream interpretation abilities, while open-source research remains limited to monolingual, western-centric datasets, with evaluations largely confined to classifications. We address these gaps by introducing a bilingual dataset of 31,877 unique dream-interpretation pairs across three cultural contexts: China, the Islamic and the West in English and Arabic. Fewer than 22\% dream symbols overlap across cultures. Chinese symbols emphasize scenario-based activities and figures like *Guanyin*, Islamic references religion and concepts (*paradise*, *fasting*), while the West draws on technology like *UFOs*. We evaluated 17 models. New state-of-the-art models integrating general-purpose and reasoning modes into one model perform best in reasoning mode, while earlier models separating chat and reasoning favor chat settings. While language is not a bottleneck for SOTA models, capturing cultural nuances of under-represented regions e.g.,the Islamic remains challenging. Fine-tuning of six LLMs shows that LoRA benefits larger models, while full-parameter is better for smaller ones. Although SFT equips models with cultural knowledge, post-training knowledge is less stable than pre-training, exhibiting sensitivity to training settings. Data and code are available at `http://URL.withheld.for.review`.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: multilingual benchmarks, multilingual evaluation, NLP datasets, datasets for low resource languages
Contribution Types: Data resources
Languages Studied: English, Arabic, Chinese
Submission Number: 3710
Loading