A Privacy-Preserving Framework for Medical Chatbot Based on LLM with Retrieval Augmented Generation

Published: 01 Jan 2024, Last Modified: 15 May 2025NLPCC (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the continuous advancement of Large Language Models (LLMs) and generative AI technologies, an increasing number of organizations are customizing and fine-tuning LLMs with internal data and documents to enhance the efficiency and reliability of data-driven decision-making. However, these approaches also present challenges in terms of data privacy, protection, and governance. In this paper, we investigate the implementation of privacy preservation in the construction of medical Knowledge Base Question Answering (KBQA) based on LLMs. We propose a solution involving fine-grained access control, the de-identification of sensitive data, and prompt engineering with Chain of Thought (CoT). Specifically, the visibility of information is determined based on user identity, and named entity recognition is employed to anonymize personally identifiable information. We also incorporate prompt engineering with Chain of Thought (CoT) to further enhance data privacy. Experimental study has demonstrated the effectiveness of our privacy protection framework.
Loading