LightRetriever: A LLM-based Text Retrieval Architecture with Extremely Faster Query Inference

Guangyuan Ma; Yongliang Ma; Xuanrui Gou; Zhenpeng Su; Ming Zhou; Songlin Hu

LightRetriever: A LLM-based Text Retrieval Architecture with Extremely Faster Query Inference

Guangyuan Ma, Yongliang Ma, Xuanrui Gou, Zhenpeng Su, Ming Zhou, Songlin Hu

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Information Retrieval, Efficient Deploy, Fast Query Inference, LLM-based Text Retrieval

Abstract: Large Language Models (LLMs)-based text retrieval retrieves documents relevant to search queries based on vector similarities. Documents are pre-encoded offline, while queries arrive in real-time, necessitating an efficient online query encoder. Although LLMs significantly enhance retrieval capabilities, serving deeply parameterized LLMs slows down query inference throughput and increases demands for online deployment resources. In this paper, we propose LightRetriever, a novel LLM-based retriever with extremely lightweight query encoders. Our method retains a full-sized LLM for document encoding, but reduces the workload of query encoding to no more than an embedding lookup. Compared to serving a full LLM on an A800 GPU, our method achieves over a thousand times of speedup in query encoding and over 10× increase in end-to-end retrieval throughput. Extensive experiments on large-scale retrieval benchmarks show that LightRetriever generalizes well across diverse tasks, maintaining an average of 95% retrieval performance.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 6716

Loading