An Encoder Attribution Analysis for Dense Passage Retriever in Open-Domain Question AnsweringDownload PDF

Anonymous

08 Mar 2022 (modified: 05 May 2023)NAACL 2022 Conference Blind SubmissionReaders: Everyone
Paper Link: https://openreview.net/forum?id=LrR-9tt62rw
Paper Type: Long paper (up to eight pages of content + unlimited references and appendices)
Abstract: The bi-encoder design of dense passage retriever (DPR) is a key factor to its success in open-domain question answering (QA). However, it is unclear how DPR's question encoder and passage encoder individually contributes to the overall performance, which we refer to as the encoder attribution problem. The problem is important as it helps us isolate responsible factors for individual encoders to further improve overall performance. In this paper, we formulate our analysis under a probabilistic framework called encoder marginalization, where we quantify the contribution of a single encoder by marginalizing over other variables. We find that the passage encoder contributes more than the question encoder to the in-domain retrieval accuracy. We further use an example to demonstrate how to find the affecting factors for each encoder, where we train multiple DPR models with different amounts of data and use encoder marginalization to analyze the results. We find that the positive passage overlap and corpus coverage of training data have big impacts on the passage encoder, while the question encoder is mainly affected by training sample complexity under this setting. Based on this framework, we can devise data-efficient training regimes: for example, we manage to train a passage encoder on SQuAD using 60\% less training data without loss of accuracy. These results illustrate the utility of our encoder attribution analysis.
0 Replies

Loading