MultiContrievers: Analysis of Dense Retrieval Representations

Seraphina Goldfarb-Tarrant; Pedro Rodriguez; Jane Dwivedi-Yu; Patrick Lewis

MultiContrievers: Analysis of Dense Retrieval Representations

Seraphina Goldfarb-Tarrant, Pedro Rodriguez, Jane Dwivedi-Yu, Patrick Lewis

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: visualization or interpretation of learned representations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: information theory, probing, retrieval, dense retrieval, gender bias, fairness

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We train 25 dense retrievers, 'MultiContrievers', from different initial seeds and apply information theoretic probing to them to analyse variability in information encoded in dense retrieval representations. We look at performance and fairness both.

Abstract: Dense Retrievers compress source documents into vector representations; the information they encode determines what is available to downstream tasks (e.g., QA, summarisation). Yet there is little analysis of the information in retriever representations. We conduct the first analysis comparing the information captured in dense retriever representations as compared to language model representations. To do this analysis, we present MultiContrievers, 25 contrastive dense retrievers initialized from the 25 MultiBerts. We use information theoretic probing to analyse how well MultiContrievers encode two example pieces of information: topic and demographic gender (measured as extractability of these two concepts), and we correlate this to performance on 14 retrieval datasets covering seven distinct retrieval tasks. We find that: 1) MultiContriever contrastive training increases extractability of both topic and gender, but also has a regularisation effect; MultiContrievers are more similar to each other than MultiBerts, 2) extractability of both topic and gender correlate poorly with benchmark performance, revealing a gap between the effect of the training objective on representations, and desirable qualities for the benchmark 3) MultiContriever representations show strong potential for gender bias, and we do find allocational gender bias in retrieval benchmarks. However, a causal analysis shows that the source of the gender bias is not in the representations, suggesting that despite this potential, current gender bias is coming from either the queries or retrieval corpus, and cannot be corrected by improvements to modelling alone. We additionally find 4) significant variability across random seeds, suggesting that future work should test across a broad spread, which is not currently standard. We release our 25 MultiContrievers (including intermediate checkpoints) and all code to facilitate further analysis.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5740

Loading