INFOSEEK: A Dataset with Information-Seeking Queries for Context-Aware Evaluation of Large Language Models
Keywords: information seeking, model behavior, diversity in responses
Abstract: Users increasingly rely on large language model (LLM)–powered chatbots for information seeking, often replacing traditional search engines. This new high-stakes role of LLMs as information intermediaries requires systematic evaluation. However, existing frameworks assume query resolution requires only fact retrieval, whereas real-world information-seeking interactions with LLMs are much more complex. To address this gap, we introduce INFOSEEK, a dataset with over 3k information-seeking queries from user interactions with LLM chatbots. It is annotated for various information needs (e.g. factual, analytical, subjective) and high-stakes topics (i.e., those that can impact people's lives and decision-making). We use INFOSEEK to evaluate model behavior for complex information needs, including queries that require personalization (the majority of queries in our data). Our findings show that LLMs generate more diverse responses to queries that require content synthesis or evaluation than for those with fact retrieval.
Our results highlight the complexity of real-world information needs, and the importance of accounting for this complexity when evaluating model behavior or designing realistic personalization datasets.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: corpus creation; model bias/fairness evaluation; human-computer interaction
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English
Submission Number: 9195
Loading