Keywords: Caregiver-child communication; Chinese corpus; natural language processing
TL;DR: We introduce the Resonance Corpus, a large-scale collection of natural Chinese caregiver–child conversations designed as infrastructure for aligning language models to underserved communities.
Abstract: We introduce the Resonance Corpus, a large-scale collection of natural Chinese caregiver–child conversations designed as infrastructure for aligning language models to underserved communities. The corpus captures everyday, intergenerational talk around child-friendly news prompts and includes rich contextual and cognitive information. We use this resource to argue for a research agenda that treats family dialogue as a key testbed for culturally grounded and developmentally appropriate AI. In particular, we outline how the corpus supports three strands of work: participatory alignment with community-contributed data, lightweight instruction tuning of Chinese LLMs under realistic computational budgets, and evaluation protocols that focus on cognitive fit and cultural robustness, rather than solely on generic benchmark scores. By framing Chinese caregiver–child dialogue as a core low-resource setting, we aim to provide open infrastructure for building language technologies that communicate more effectively with children and caregivers in underrepresented linguistic and cultural contexts.
Submission Number: 21
Loading