Learning Query-aware Embedding Index for Improving E-commerce Dense Retrieval

Published: 01 Jan 2023, Last Modified: 19 May 2025SIGIR 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The embedding index has become an essential part of the dense retrieval (DR) system, which enables a fast search for billion of items in online E-commerce applications. To accelerate the retrieval process in industrial scenarios, most of the previous studies only utilize item embeddings. However, the product quantization process without query embeddings will lead to inconsistency between queries and items. A straightforward solution is to put query embedding into the product quantization process. But we found that the distance of the positive query and item embedding pairs is too large, which means the query and item embeddings learned by the two-tower are not fully aligned. This problem would lead to performance decay when directly putting query embeddings into the product quantization.In this paper, we propose a novel query-aware embedding Index framework, which aligns the query and item embedding space to reduce the distance between positive pairs, thereby mixing the query and item embeddings to learn better cluster centers for product quantization. Specifically, we first propose s symmetric loss to train a better two-tower to achieve space alignment. Subsequently, we propose a mixed quantization strategy to put the query embeddings into the product quantization process for bridging the gap between queries and compressed item embeddings. Extensive experiments show that our framework significantly outperforms previous models on a real-world dataset, which demonstrates the superiority and effectiveness of the framework.
Loading