Bridge the Query and Document: Contrastive Learning for Generative Document Retrieval

Published: 01 Jan 2024, Last Modified: 16 May 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Generative retrieval has garnered significant attention for its end-to-end optimization and exceptional performance. Compared with the dense retrieval paradigm, the generative retrieval paradigm maps a query to a relevant document ID only relying on its model parameters, greatly simplifying the retrieval process. However, generative retrieval faces two challenges: it does not explicitly model the semantic relevance between query and document, and there exists a gap between the representation of query and document. To this end, we propose the Contrastive Search Index (ConSI), a simple but effective contrastive learning framework for generative document retrieval, to address the above challenges. Experiments show that the proposed ConSI consistently surpasses the previous generative retrieval baselines, NCI. Further analysis of different factors and indicators verifies the performance enhancement brought by our method. Besides, our ConSI also achieves excellent performance in the dense retrieval paradigm, demonstrating that the designed framework boosts representation learning ability and can be directly used as a dense retriever.
Loading