ChildEval:WHEN LARGE LANGUAGE MODELS MEET CHILDREN’S PERSONALITIES

ChildEval:WHEN LARGE LANGUAGE MODELS MEET CHILDREN’S PERSONALITIES

ACL ARR 2026 January Submission4269 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM-based chatbots, children's personalities, personalized dialogue generation, children's caregiver

Abstract: While LLMs enable personalized chatbots, their effectiveness in responding to children remains unclear, as children’s interactions differ from those of adults and systematic preference evaluation is lacking. To address this gap, we introduce ChildEval, a benchmark for evaluating LLMs’ ability to infer and follow child-centered preferences in long-context conversations. ChildEval contains 29K synthesized persona profiles of children aged 3–6, providing relatively static background information. Each persona is associated with an explicit preference—which may align with, conflict with, or be independent of the persona—and with corresponding implicit preferences expressed through 6–10 turn dialogues. Explicit and implicit preferences capture the same preference but differ in expression, reflecting dynamic aspects independent of the static persona. The benchmark spans five top-level and fourteen sub-level categories covering children’s daily lives and development. We further propose fine-grained, child-centric evaluation protocols to systematically assess open-source LLMs. Experimental results demonstrate how different personalized representations affect LLM responses and suggest that finetuning on this dataset may enhance performance.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Resources and Evaluation,Language Modeling

Contribution Types: Reproduction study, Data resources, Data analysis

Languages Studied: Chinese and English

Submission Number: 4269

Loading