Resource2Box: Learning To Rank Resources in Distributed Search Using Box Embedding

Ulugbek Ergashev, Geon Lee, Kijung Shin, Eduard C. Dragut, Weiyi Meng

Published: 2024, Last Modified: 16 Jan 2026ICDM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The rapid and continuous growth of internet content poses significant challenges to conventional web search engines. Distributed Search (DS) offers a solution by integrating multiple information sources into a unified search system. When a user submits a query, the DS system selects relevant resources and ranks the documents within these selected resources. Recently, representation learning of queries and resources has been employed to enhance DS performance. However, existing methods that represent resources as vector embeddings may not sufficiently capture the semantic diversity within each resource. To address this limitation, we propose Resource2Box, a novel representation learning method for DS that models resources as boxes (i.e., hypercubes) in the latent space. Resource2Box more effectively captures the diverse and intricate information of documents within resources compared to single-point vector embeddings. It learns a box embedding for each resource, characterized by a center and offset, through two key processes: (1) aggregating document information within each resource using attentive pooling and (2) propagating information across resources. These box embeddings are learned to reflect the semantic relationships with training queries, utilizing a unique box-vector distance metric. Comprehensive experimentation on benchmark datasets demonstrates that Resource2Box significantly enhances resource selection, improving ranking performance by up to 24.7% across various metrics.