Towards Personalized Deep Research: Benchmarks and Evaluations

Published: 26 Jan 2026, Last Modified: 01 Mar 2026ICLR 2026 ConditionalPosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Personalization, benchmark, Deep Research, Agent
TL;DR: Introducing PDR-Bench and the PQR framework, the first comprehensive benchmark for evaluating personalization in Deep Research Agents.
Abstract: Deep Research Agents (DRAs) can autonomously conduct complex investigations and generate comprehensive reports, demonstrating strong real-world potential. However, existing evaluations mostly rely on close-ended benchmarks, while open-ended deep research benchmarks remain scarce and typically neglect personalized scenarios. To bridge this gap, we introduce Personalized Deep Research Bench (PDR-Bench), the first benchmark for evaluating personalization in DRAs. It pairs 50 diverse research tasks across 10 domains with 25 authentic user profiles that combine structured persona attributes with dynamic real-world contexts, yielding 250 realistic user-task queries. To assess system performance, we propose the PQR Evaluation Framework, which jointly measures Personalization Alignment, Content Quality, and Factual Reliability. Our experiments on a range of systems highlight current capabilities and limitations in handling personalized deep research. This work establishes a rigorous foundation for developing and evaluating the next generation of truly personalized AI research assistants.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 5646
Loading