Abstract: Dense retrieval methods have demonstrated promising performance in multilingual information retrieval, where queries and documents can be in different languages. However, dense retrievers typically require a substantial amount of paired data, which poses even greater challenges in multilingual scenarios. This paper introduces \textbf{UMR}, an \underline{U}nsupervised \underline{M}ultilingual dense \underline{R}etriever trained without any paired data. Our approach leverages the generative capabilities of multilingual language models to acquire pseudo labels for training dense retrievers. Experimental results on two benchmark datasets show that UMR can outperform all supervised baselines, showcasing the potential of training multilingual retrievers without paired data, thereby enhancing their practicality.
Paper Type: long
Research Area: Information Retrieval and Text Mining
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Arabic,Bengali,Finnish,Japanese,Korean,Russian,Telugu,English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading