LLMEvalRec: An Agentic Framework for Simulating Users to Evaluate News Recommendation Systems
Keywords: Large Language Models, User Simulation, News Recommendations
TL;DR: A novel framework uses LLMs to simulate realistic user behavior for evaluating news recommendation systems, achieving near-perfect correlation with real user data and addressing cold-start challenges in system development.
Abstract: Evaluating news recommendation systems (NRS) presents unique challenges due to their dynamic and interactive nature coupled with evolving user interests. In the early stages of development, when user bases and historical data are scarce, it is difficult to conduct meaningful offline and online evaluations. This cold-start evaluation challenge hinders data-driven decision-making for product development and deployment. To address this, we propose LLMEvalRec, a framework that leverages Large Language Model (LLM) agents to simulate user behavior for NRS evaluation. Our approach features generative agents that automatically generate user profiles from a small number of user reading histories and perform realistic actions, while introducing the Guided Episodic Search (GUES) algorithm, which adapts successful human prompt engineering practices into an automated optimization process. Experiments demonstrate that LLMEvalRec-generated data achieves 0.97 Spearman correlation with real evaluation rankings, significantly outperforming baseline simulators (0.4 and -0.05), and successfully predicts relative performance trends across both MIND benchmark and real customer datasets. Production environment validation shows consistent alignment between simulated metrics and real click-through rate (CTR) improvements.
Area: Modelling and Simluation of Societies (SIM)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 219
Loading