C$$^2$$LIR: Continual Cross-Lingual Transfer for Low-Resource Information Retrieval

Jaeseong Lee, Dohyeon Lee, Jongho Kim, Seung-won Hwang

Published: 01 Jan 2023, Last Modified: 13 Mar 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: This paper proposes a method to train information retrieval (IR) model for a low-resource language with a small corpus and no parallel sentences. Although neural IR models based on pretrained language models (PLMs) have shown high performance in high-resource languages (HRLs), building PLM for LRLs is challenging. We propose C$^2$LIR, a method to build a high-performing neural IR model for LRL, with dictionary-based pretraining objectives for cross-lingual transfer from HRL. Experiments on the monolingual and cross-lingual IR in diverse low-resource scenarios show the effectiveness and data efficiency of C$^2$LIR.

External IDs:doi:10.1007/978-3-031-28238-6_37