POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval

Yaoyang Liu; Junlin Li; Yinjun Wu; zhen chen

POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval

Yaoyang Liu, Junlin Li, Yinjun Wu, zhen chen

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Although Multi-Vector Retrieval (MVR) has achieved the state of the art on many information retrieval (IR) tasks, its performance highly depends on how to decompose queries into smaller pieces, say phrases or tokens. However, optimizing query decomposition for MVR performance is not end-to-end differentiable. Even worse, jointly solving this problem and training the downstream retrieval-based systems, say RAG systems could be highly inefficient. To overcome these challenges, we propose Performance-Oriented Query Decomposer (POQD), a novel query decomposition framework for MVR. POQD leverages one LLM for query decomposition and searches the optimal prompt with an LLM-based optimizer. We further propose an end-to-end training algorithm to alternatively optimize the prompt for query decomposition and the downstream models. This algorithm can achieve superior MVR performance at a reasonable training cost as our theoretical analysis suggests. POQD can be integrated seamlessly into arbitrary retrieval-based systems such as Retrieval-Augmented Generation (RAG) systems. Extensive empirical studies on representative RAG-based QA tasks show that POQD outperforms existing query decomposition strategies in both retrieval performance and end-to-end QA accuracy. POQD is available at https://github.com/PKU-SDS-lab/POQD-ICML25.

Lay Summary: In this work, we aim to enhance the end-to-end performance of retrieval-based systems, say RAG systems, through enhancing their retrieval process. To achieve this, we employ one emerging technique called multi-vector retrieval, which decomposes queries and retrieved data into fine-grained pieces, and then evaluates a new score function to evaluate the overall query-data similarities. Although this technique has been quite effective, we discovered that how to decompose queries into appropriate granularities matters. Hence, we proposed a novel method to optimize the way of decomposing queries such that the end-to-end RAG performance could be optimized. Both theoretical and empirical results can demonstrate the effectiveness and efficiency of our method. This method could be widely used to enhance the performance of arbitrary RAG systems in a lightweight manner.

Link To Code: https://github.com/PKU-SDS-lab/POQD-ICML25

Primary Area: Applications

Keywords: information retrieval, retrieval-augmented generation

Submission Number: 5409

Loading