From Profiling to Synthesis: Benchmarking Implicit Behavioral Alignment in Personalized LLM Agents

From Profiling to Synthesis: Benchmarking Implicit Behavioral Alignment in Personalized LLM Agents

ACL ARR 2026 January Submission5427 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM/AI agents, benchmarking, prompting

Abstract: Large Language Models have advanced autonomous agents, but personalization remains essential for agents to be practically useful. To measure this ability, recent benchmarks aim to evaluate personalization in agents. However, they either provide static preference snapshots or fixed interaction logs, or they evaluate personalization mainly through question answering over retrieved profiles. These designs under-represent the complexity of real preferences in dialogue histories and fail to assess preference-conditioned task execution, thereby obscuring a critical knowing-doing gap. To address this, we introduce PersonaKAG, a benchmark for implicit behavioral alignment built from longitudinal interaction histories that contain noise, implicit cues, and temporal inconsistencies. PersonaKAG evaluates whether an agent can execute tasks while satisfying implicit constraints inferred from history, rather than only answering preference questions. We further propose SynRPG, a framework that combines broad retrieval with trajectory-level alignment to resolve conflicting priorities over time. Results on PersonaKAG suggest that effective personalization is still challenging for state-of-the-art LLM agents.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, corpus creation, automatic evaluation of datasets

Languages Studied: English

Submission Number: 5427

Loading