Beyond Manual Prompts: In-Context Learning for LLM Query Expansion for information retrieval via Auto-Generated Pseudo-Relevance Datasets

ACL ARR 2025 May Submission7475 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Query expansion (QE) enhances information retrieval (IR) by addressing vocabulary gaps between queries and documents. While large language models (LLMs) enable generative QE through in-context learning with few examples, existing methods rely on manual prompts or static datasets, limiting domain adaptability and systematic evaluation of few-shot selection strategies. We propose an automated framework to construct domain-adaptive QE candidate datasets without human annotation. Leveraging an unlabeled target-domain corpus and a BM25-then-MonoT5 retrieval pipeline, our method extracts pseudo-relevant passages from seed queries, transforming them into few-shot exemplar candidates. We evaluate four selection strategies for LLM demonstrations: static, random, clustering-based diversity, and embedding-based similarity. Experiments across web search (TREC 2019, 2020 DL Track), financial (FiQA), and open-domain entity queries (DBPedia) using Qwen-2.5-7B-Instruct show that LLM-generated expansions largely improve BM25 retrieval performance. Our framework provides a scalable, domain-adaptive solution for in-context query expansion with LLMs—serving as both a reproducible benchmark for evaluation and a practical tool for real-world deployment, while enabling further research on in-context learning few-shot selection from large candidate pools.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: passage retrieval, automatic creation and evaluation of language resources, NLP datasets, prompting
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data resources
Languages Studied: English
Submission Number: 7475
Loading