Abstract: Despite being essential sources of information, Brazilian medicine package leaflets remain underutilized due to their complexity and lack of user-friendly tools for information retrieval. Currently, there are no chat-based systems in Portuguese designed to assist patients in accessing and understanding leaflet content. To address this gap, we present RagPharma, a novel Retrieval-Augmented Generation (RAG) system that integrates professional medicine leaflets into a chat interface to answer patient queries. During RagPharma's development, we observed that evaluation performance was significantly higher when using questions derived from the same dataset used to build the system. This led to the identification of a critical evaluation bias, often overlooked in RAG applications. In response, we propose a novel dual-dataset evaluation framework, which separates the knowledge base and the evaluation source in distinct, but related, datasets. Experimental results confirmed the presence of bias when using overlapping datasets and demonstrated the reliability of our dual-dataset methodology. Under this new evaluation scheme, RagPharma achieved 81% accuracy using the Mistral 7B model—representing a 60% improvement over standalone LLMs. These findings validate both the effectiveness of RagPharma and the importance of unbiased evaluation strategies in domain-specific RAG systems.
External IDs:doi:10.5753/jbcs.2025.5767
Loading