Keywords: LLM-based chatbots, children's personalities, personalized dialogue generation, children's caregiver
Abstract: The remarkable success of Large Language Models (LLMs) has revolutionized LLM-based chatbots for personalized tasks beyond generic dialogues. Personalization involves customizing LLMs to generate text responses based on user preferences. One promising endeavor is to enable personalized interactions for children's caretakers while also promoting development and learning. However, dedicated research is required to determine whether LLMs can effectively deliver personalized responses based on children's preferences, as their interactions differ from those of adults. We introduce ChildEval, a benchmark to evaluate LLMs' capacity to infer, interpret, and follow child-centered preferences in a long-context conversational setting. Our benchmark comprises of 29K synthesized children's (ages 3-6) persona profiles, which are related to their preferences in both explicit and implicit manners. Implicit preferences are integrated inside dialogues consisting of 6 to 10 turns. The preferences cover 5 top-level and 14 sub-level topics that involve children's daily lives and development. We further propose child-centric preferences to systematically evaluate the performance of open-source LLMs. Experimental results demonstrate the impact of various personalized representations on LLM responses and indicate that fine-tuning on this dataset may enhance performance.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 8998
Loading