Abstract: Large Language Models (LLMs) excel in cross-lingual question answering (QA) but often hallucinate due to mismatches between the query and the model's internal knowledge representation. Retrieval-augmented generation (RAG) mitigates this issue but struggles with cross-lingual retrieval inconsistencies. We propose a retrieval method that enhances recall and re-ranking by improving semantic alignment across languages. Our approach integrates a language-aware retrieval mechanism with a fine-tuned encoder model, ParaXLM-SR, to refine query-context matching and prioritize relevant information. By leveraging bias-adjusted similarity re-ranking, our method further mitigates cross-lingual retrieval noise and improves context relevance.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: Information Retrieval and Text Mining, Multilinguality and Language Diversity, Machine Learning for NLP, Interpretability and Analysis of Models for NLP
Contribution Types: NLP engineering experiment
Languages Studied: English, French, German, Russian, Swahili, Spanish, Chinese, Arabic, Bengali, Finnish, Indonesian, Korean
Submission Number: 4188
Loading