Keywords: Character Profiling, Large Language Models, LLM Evaluation
Abstract: Building high-quality character profiles is a foundational prerequisite for developing immersive Role-Playing Language Agents (RPLAs). However, existing profiling methods primarily rely on literature-based extraction or LLM-based generation, which suffer from limited media coverage, high manual costs, and a propensity for factual hallucinations. To address these bottlenecks, we propose CharacterHub,
an automated character profiling framework powered by deep search agents. Unlike traditional extractive pipelines, our framework autonomously navigates open web sources to retrieve and aggregate heterogeneous information across multiple dimensions. This agentic approach offers unparalleled scalability, extending high-fidelity profiling beyond literary figures to anime, games, and user-generated characters, without human intervention. To rigorously validate our method, we establish an automatic evaluation protocol using large-scale, human-curated data from Fandom as gold reference. Experimental results demonstrate that our dataset achieves strong alignment with reference sources, notably reaching a 83.13% Support Score in the critical personality dimension, while attaining nearly twice the information density of Fandom references. We will publicly release the dataset and associated resources.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: NLP datasets, automatic evaluation of datasets, metrics
Contribution Types: Data resources
Languages Studied: English
Submission Number: 8379
Loading