Abstract: The current paradigm in dense retrieval is to represent queries and passages as low-dimensional real-valued vectors using neural language models, and then compute query-passage similarity as the dot product of these vector representations. A limitation of this approach is that these learned representations cannot capture or express uncertainty. At the same time, information retrieval over large corpora contains several sources of uncertainty, such as misspelled or ambiguous text. Consequently, retrieval methods that incorporate uncertainty estimation are more likely to generalize well to such data distribution shifts. The multivariate representation learning (MRL) framework proposed by Zamani & Bendersky (2023) is the first method that works in the direction of modeling uncertainty in dense retrieval. This framework represents queries and passages as multivariate normal distributions and computes query-passage similarity as the negative Kullback-Leibler (KL) divergence between these distributions. Furthermore, MRL formulates KL divergence as a dot product, allowing for efficient first-stage retrieval using standard maximum inner product search.
In this paper, we attempt to reproduce MRL under memory constraints (e.g., an academic computational budget). In particular, we focus on a memory-limited, single GPU setup. We find that the original work (i) introduces a typographical/mathematical error early in the formulation of the method that propagates to the rest of the original paper's mathematical formulations, and (ii) does not fully specify certain important design choices that can strongly influence performance. In light of the aforementioned, we address the mathematical error and make some reasonable design choices when important details are unspecified. Additionally, we expand on the results from the original paper with a thorough ablation study which provides more insight into the impact of the framework's different components. While we confirm that MRL can have state-of-the-art performance, we could not reproduce the results reported in the original paper or uncover the reported trends against the baselines under a memory-limited setup that facilitates fair comparisons of MRL against its baselines. Our analysis offers insights as to why that is the case. Most importantly, our empirical results suggest that the variance definition in MRL does not consistently capture uncertainty. The source code for our reproducibility study is available at: https://anonymous.4open.science/r/multivariate_ir_code_release-AB26.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=1u9WOhpISC
Changes Since Last Submission: 1. In this version, we explicitly state in the paper that this work is a reproducibility study under limited computational resources.
2. As requested by the reviewers, we made several corrections to the paper for the cases where the language was strong.
3. In response to the reviewers' request, we have included (i) a new experiment in which the model is trained with a batch size of 512, and (ii) an ablation study on the effect of batch size on the retrieval performance of the model. We conclude that the impressive results reported in the original paper are most likely due to training with a large batch size for many training steps.
4. We report higher scores than in our previous submission regarding MRL's performance when trained with the original training setup. To this extent, in the new version of the paper, we do not follow the CLDRD training setup to train a more effective MRL model. However, we still report the retrieval performance of MRL following the CLDRD training setup - due to its significance in providing a better understanding of the difference in the performance of MRL and its main competitor, CLDRD - as part of our ablation study in Section 5.3.
Assigned Action Editor: ~Antoine_Patrick_Isabelle_Eric_Ledent1
Submission Number: 3191
Loading