What Breaks Multilingual Long-Form RAG? An Experimental Study of Attribution Errors in Report Generation

Published: 01 May 2026, Last Modified: 01 May 2026RAG4Report 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: Long-Form Generation, Multilingual NLP, Attribution Fidelity, Cross-Lingual Semantic Drift, Fact Verification
TL;DR: We study multilingual long-form report generation with RAG, showing that attribution errors increase with translation and report length, highlighting trade-offs between coverage and citation fidelity.
Abstract: Retrieval-Augmented Generation (RAG) can improve the factual grounding of text generation by incorporating external documents. While prior work has examined RAG for short-form question answering, its performance in long-form report generation—particularly in multilingual settings—remains less well understood. In this study, we conduct an exploratory investigation of multilingual long-form RAG, focusing on attribution faithfulness, i.e., the extent to which generated claims are supported by cited sources. We evaluate four representative pipeline configurations across two language pairs (English–German and English–Hindi) on a set of 200 report prompts, generating short to medium-length reports (approximately 300–700 words). Our analysis suggests that multilingual pipelines tend to introduce more attribution inconsistencies than monolingual baselines, that translation-based strategies can improve coverage but occasionally reduce citation fidelity, and that longer reports exhibit modestly lower attribution quality. Prompting strategies provide limited improvements. These findings highlight practical challenges in developing reliable multilingual report generation systems and underscore the importance of careful attribution evaluation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 3
Loading