INFOSEEK: A Dataset with Information-Seeking Queries for Context-Aware Evaluation of Large Language Models

INFOSEEK: A Dataset with Information-Seeking Queries for Context-Aware Evaluation of Large Language Models

ACL ARR 2026 January Submission9195 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: information seeking, model behavior, diversity in responses

Abstract: Users increasingly rely on large language model (LLM)–powered chatbots for information seeking, often replacing traditional search engines. This new high-stakes role of LLMs as information intermediaries requires systematic evaluation. However, existing frameworks assume query resolution requires only fact retrieval, whereas real-world information-seeking interactions with LLMs are much more complex. To address this gap, we introduce INFOSEEK, a dataset with over 3k information-seeking queries from user interactions with LLM chatbots. It is annotated for various information needs (e.g. factual, analytical, subjective) and high-stakes topics (i.e., those that can impact people's lives and decision-making). We use INFOSEEK to evaluate model behavior for complex information needs, including queries that require personalization (the majority of queries in our data). Our findings show that LLMs generate more diverse responses to queries that require content synthesis or evaluation than for those with fact retrieval. Our results highlight the complexity of real-world information needs, and the importance of accounting for this complexity when evaluating model behavior or designing realistic personalization datasets.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation; model bias/fairness evaluation; human-computer interaction

Contribution Types: Model analysis & interpretability, Data resources, Data analysis

Languages Studied: English

Submission Number: 9195

Loading