Taming the Babel of Queries: Enhancing RAG Systems for Intellectual Property via Synthetic Multi-Perspective Fine-tuning
Abstract: NLP systems of the Intellectual property (IP) field face significant challenges due to the diverse ways in which users express queries, such as colloquial language and ambiguous terms. These issues hinder the effectiveness of Retrieval-Augmented Generation (RAG) systems in IP filed. In this paper, we propose a novel Multi-Angle Question Generation and Retrieval Fine-Tuning Method (MQG-RFM) that leverages large language models (LLMs) as agents to simulate diverse user queries. By generating multiple variations of queries and fine-tuning the retrieval model with hard negative mining, MQG-RFM improves the retrieval accuracy and answer generation quality in patent-related Q\&A scenarios. MQG-RFM offers a simple and generalizable solution that does not require complex architectural changes, making it an efficient and scalable method for personalized deployment in small and medium-sized IP agencies. Experimental results on a Taiwan patent Q\&A dataset show 185.62\% improvement in retrieval accuracy on the Patent Consultation dataset and 262.26\% improvement on the Novel Patent Technology Report dataset, with 14.22\% and 53.58\% improvements in generation quality,respectively, over the baselines.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: intellectual property; Retrieval Enhancement Generation;Large Language Model
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources
Languages Studied: Chinese
Submission Number: 7668
Loading