Abstract: Retrieval-augmented generation (RAG) is applied across diverse domains, allowing to employment of large language models (LLMs) for in-domain question-answering without the need for fine-tuning of the generative LLMs with in-domain data. In this work, we analyse the applicability of RAG for procurement validation. We compare various configurations of different methods involved in the RAG process and find the best-performing methods for procurement validation. We analyze the impact of various text extraction libraries, segmentation strategies with different segment sizes, embedding model selection, and prompt construction methods. Furthermore, we show that recall of document retrieval can be improved by fine-tuning the embedding model with in-domain data - a collection of procurement documents in Latvian. Our best-performing configuration achieves a procurement validation accuracy of 70.73% on a publicly available procurement validation dataset for Latvian.
External IDs:dblp:conf/icaart/DeksneSPHJPR26
Loading