RAG-Based AI Agents for Multilingual Help Desks in Low- Bandwidth Environments

TMLR Paper6208 Authors

14 Oct 2025 (modified: 05 Nov 2025)Withdrawn by AuthorsEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The increasing demand for multilingual help desk systems has prompted the need for ad- vanced solutions that can provide accurate, real time responses across various languages. This paper presents a retrieval-augmented generation (RAG) based system optimized for low-bandwidth environments. The proposed system integrates retrieval techniques with generative models, enabling it to generate contextually relevant responses while minimiz- ing latency. To address the challenge of low-bandwidth operation, we introduce model distillation and token compression methods, which reduce model size and response time. The systems performance is evaluated on multilingual datasets, demonstrating substantial improvements over baseline models in terms of accuracy, recall, precision, and F1-Score. Our approach effectively tackles the challenges of multilingual support, retrieval accuracy, and low-latency performance, making it a viable solution for real-time customer support in resource-constrained settings. The findings suggest that the proposed system can serve as a robust platform for multilingual help desks, offering improved scalability and efficiency. The system was built using a hybrid retrievergenerator architecture, with a cross-lingual trans- former for retrieval and a transformer-based sequence-to-sequence model for generation. Multilingual datasets, including TyDiQA, mMARCO, XQuAD, MLDoc, and AfriSenti, were used for training and evaluation. Low-bandwidth optimization techniques such as model distillation and token compression were applied.The proposed system achieved higher EM, BLEU, and MRR scores than baseline models, with EM of 79.2%, BLEU of 32.8, and MRR of 0.80, while reducing latency from 3.4s in the baseline to 2.1s. The distilled model further reduced latency to 1.8s with minor performance trade-offs. Error analysis showed reduced hallucination rates and improved relevance in responses for low-resource languages
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Ruqi_Zhang1
Submission Number: 6208
Loading