From National Curricula to Cultural Awareness: Constructing Open-Ended Culture-Specific Question Answering Dataset
Keywords: Multicultural NLP, Cultural Awareness, LLM Agents, Langauge Resources, Automated Creation of Language Resources
Abstract: Large language models (LLMs) achieve strong performance on many tasks, but their progress remains uneven across languages and cultures, often reflecting values latent in English-centric training data.
To enable practical cultural alignment, we propose a scalable approach that leverages national social studies curricula as a foundation for culture-aware supervision.
We introduce CuCu, an automated multi-agent LLM framework that transforms national textbook curricula into open-ended, culture-specific question–answer pairs.
Applying CuCu to the Korean national social studies curriculum, we construct KCaQA, comprising 34.1k open-ended QA pairs.
Our quantitative and qualitative analyses suggest that \data covers culture-specific topics and produces responses grounded in local sociocultural contexts.
Paper Type: Short
Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good
Research Area Keywords: values and culture, LLM agents, value-centered design, language resources, automatic creation and evaluation of language resources, NLP datasets
Contribution Types: Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: Korean, English, Chinese, Japanese
Submission Number: 6902
Loading