Zero-shot Document Retrieval with Hybrid Pseudo-document Retriever

Published: 2025, Last Modified: 21 Jan 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The zero-shot retrieval task aims to retrieve the most relevant documents to a user’s query without relevance labels. Current approaches expand input queries by generating pseudo-documents with large language models (LLMs) and perform document retrieval based on the expanded queries. However, their retrieval methods are limited to either sparse or dense retrieval methods alone. In this paper, we propose a hybrid retriever to further improve the quality of the pseudo-documents and to obtain the relevant information more effectively. Specifically, we use sparse retrievers to obtain keyword information and dense retrievers to obtain contextual information. Then, we introduce reciprocal ranking fusion and weighted scoring fusion into both the pre-retrieval of candidate documents and the final-retrieval of final results to calculate the overall hybrid matching score. Experimental results on TREC DL19/DL20 and several datasets from the BEIR benchmark indicate the superiority of our proposed method.
Loading