Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems

ACL ARR 2025 February Submission434 Authors

07 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: RAG has become a key technique for enhancing LLMs by reducing hallucinations and providing external knowledge, especially in domain expert systems where LLMs may lack sufficient up-to-date knowledge. However, developing these systems in low-resource settings introduces several challenges: (1) handling heterogeneous data sources, (2) optimizing retrieval phase for trustworthy answers, and (3) evaluating generated answers across diverse aspects. To address these, we introduce a data generation pipeline that transforms raw multi-modal data into structured corpus and Q&A pairs, an advanced re-ranking phase improving retrieval precision, and a reference matching algorithm enhancing answer traceability. Applied to the automotive engineering domain, our system improves factual correctness (+1.94), informativeness (+1.16), and helpfulness (+1.67) over a non-RAG baseline. These results highlight the effectiveness of our approach across distinct aspects, with strong answer grounding and transparency.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: retrieval-augmented generation, NLP in resource-constrained settings, dense retrieval, re-ranking
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Korean
Submission Number: 434
Loading