Empowering Low-Resource Languages: TraSe Architecture for Enhanced Retrieval-Augmented Generation in Bangla

NAACL 2025 Workshop LM4UC Submission6 Authors

Published: 04 Mar 2025, Last Modified: 21 Mar 2025LM4UCEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bangla Language, RAG, LLM, Question-answering, Low-resource language
TL;DR: This paper introduces the TraSe architecture, which enhances RAG for Bangla using Translative prompting.
Abstract: Research on Retrieval-Augmented Generation for low-resource languages has been sparse because of limited resources. To address this, we focus on Bangla, a low-resource language, and have created a dataset of 200 question-answer pairs as a basis for our study from Bangla Wikipedia dumps data. This paper introduces the TraSe architecture, which enhances RAG for Bangla using Translative prompting. Our experiments demonstrate that TraSe improves answer selection accuracy, achieving 34% with automatic retrieval and 63% with Human-in-the-Loop retrieval, outperforming baseline methods. The TraSe architecture marks a significant advancement in RAG for low-resource languages and has the potential to enhance question-answering systems for Bangla and similar languages. Future research could explore additional low-resource languages. The code is available at the following GitHub repository: https://github.com/Atia6/TraSe-Bangla-RAG.
Archival: Archival Track
Participation: Virtual
Presenter: Atia Shahnaz Ipa
Submission Number: 6
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview