From National Curricula to Cultural Awareness: Constructing Open-Ended Culture-Specific Question Answering Dataset

ACL ARR 2026 January Submission6902 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multicultural NLP, Cultural Awareness, LLM Agents, Langauge Resources, Automated Creation of Language Resources
Abstract: Large language models (LLMs) achieve strong performance on many tasks, but their progress remains uneven across languages and cultures, often reflecting values latent in English-centric training data. To enable practical cultural alignment, we propose a scalable approach that leverages national social studies curricula as a foundation for culture-aware supervision. We introduce CuCu, an automated multi-agent LLM framework that transforms national textbook curricula into open-ended, culture-specific question–answer pairs. Applying CuCu to the Korean national social studies curriculum, we construct KCaQA, comprising 34.1k open-ended QA pairs. Our quantitative and qualitative analyses suggest that \data covers culture-specific topics and produces responses grounded in local sociocultural contexts.
Paper Type: Short
Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good
Research Area Keywords: values and culture, LLM agents, value-centered design, language resources, automatic creation and evaluation of language resources, NLP datasets
Contribution Types: Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: Korean, English, Chinese, Japanese
Submission Number: 6902
Loading