K2Profile: A Benchmark for Conversational Student Profiling under Child Speech and ASR Noise
Keywords: Conversational AI, Dialogue Systems, Benchmark Datasets, Child–Computer Interaction, Educational NLP, Spoken Dialogue Systems
TL;DR: A benchmark for evaluating conversational agents that interview K–2 students to build structured profiles while handling child speech variability and ASR noise.
Abstract: Conversational AI systems have achieved strong performance in adult-oriented dialogue tasks such as customer support, tutoring, and open-domain conversation. However, conversational systems designed to interact with young children remain underexplored. Children’s speech often exhibits short utterances, phonetic variability, inconsistent grammar, playful responses, and limited domain knowledge, which pose challenges for existing dialogue systems. In addition, many real-world applications rely on automatic speech recognition (ASR), introducing transcription errors that further complicate understanding of child speech.
We introduce K2Profile, a benchmark for evaluating conversational agents that interview K–2 students to collect structured personal information about family, friends, and reading interests. The task requires agents to conduct multi-turn conversations and iteratively update a schema-based student profile. The dataset consists of real teacher–student conversations annotated with turn-level ground-truth profile states, enabling fine-grained evaluation of information elicitation during dialogue.
A key feature of the benchmark is the integration of child speech characteristics and ASR noise. Each interaction includes both human transcripts and ASR outputs with associated word error rates (WER), allowing evaluation under controllable noise conditions. To preserve realistic child dialogue behavior, the student simulator retrieves responses from real annotated interactions while mixing profile attributes from multiple students to increase diversity.
We evaluate systems using turn-level joint goal accuracy and slot-level accuracy for profile prediction. By combining real child–teacher conversations, structured state tracking, and controllable ASR noise, K2Profile provides a new benchmark for studying conversational AI systems in realistic child-interaction settings.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 143
Loading