Fitting Into Any Shape: A Flexible LLM-Based Re-Ranker With Configurable Depth and Width

Zheng Liu; Chaofan Li; Shitao Xiao; Chaozhuo Li; Chen Jason Zhang; Hao Liao; Defu Lian; Yingxia Shao

Fitting Into Any Shape: A Flexible LLM-Based Re-Ranker With Configurable Depth and Width

Zheng Liu, Chaofan Li, Shitao Xiao, Chaozhuo Li, Chen Jason Zhang, Hao Liao, Defu Lian, Yingxia Shao

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Search and retrieval-augmented AI

Keywords: Text Retrieval, Re-Ranking, Lightweighting, Flexibility

Abstract: Large language models (LLMs) provide powerful foundations to perform fine-grained text re-ranking. However, they are often pro- hibitive in reality due to constraints on computation bandwidth. In this work, we propose a flexible architecture called Matroyshka Re-Ranker, which is designed to facilitate runtime customiza- tion of model layers and sequence lengths at each layer based on users’ configurations. Consequently, the LLM-based re-rankers can be made applicable across various real-world situations. The increased flexibility may come at the cost of precision loss. To address this problem, we introduce a suite of techniques to optimize the performance. First, we propose cascaded self-distillation, where each sub-architecture learns to preserve a precise re-ranking performance from its super components, whose predictions can be exploited as smooth and informative teacher signals. Second, we design a factorized compensation mechanism, where two col- laborative Low-Rank Adaptation modules, vertical and horizontal, are jointly employed to compensate for the precision loss resulted from arbitrary combinations of layer and sequence compression. We perform comprehensive experiments based on the passage and document retrieval datasets from MSMARCO, along with all public datasets from BEIR benchmark. In our experiments, Ma- tryoshka Re-Ranker substantially outperforms the existing meth- ods, while effectively preserving its superior performance across various forms of compression and different application scenarios. Our source code has been uploaded to this anonymous repository

Submission Number: 1519

Loading