CROSS: Analyzing the Trade-offs in Long-Context Cross-lingual Retrieval

Published: 06 Mar 2025, Last Modified: 18 Mar 2025ICLR 2025 FM-Wild WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Cross-lingual information retrieval, Cost-efficient retrieval, Multi-target reasoning
TL;DR: CROSS is a scalable, cost-efficient retrieval framework that improves long-context and cross-lingual information retrieval by reducing token processing and mitigating mid-context failures.
Abstract: Cross-lingual information retrieval in long-context settings faces challenges such as the "lost-in-the-middle" phenomenon and computational inefficiencies. We introduce CROSS (Cross-lingual Retrieval Optimized for Scalable Solutions), a two-phase retrieval framework that integrates multilingual embeddings with efficient candidate selection to enhance retrieval-augmented generation (RAG). Evaluating CROSS on the newly developed mLongRR-V2 benchmark—covering seven languages and 49 language pairs—we demonstrate substantial improvements in retrieval accuracy, scalability to 512,000-token contexts, and robustness across linguistic structures. Compared to baseline large language models (LLMs), CROSS significantly mitigates mid-context retrieval failures while reducing computational overhead. Our results establish CROSS as an efficient and scalable solution for multilingual long-context retrieval.
Submission Number: 68
Loading