Taming the Babel of Queries: Enhancing RAG Systems for Intellectual Property via Synthetic Multi-Perspective Fine-tuning

Taming the Babel of Queries: Enhancing RAG Systems for Intellectual Property via Synthetic Multi-Perspective Fine-tuning

ACL ARR 2025 February Submission7668 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: NLP systems of the Intellectual property (IP) field face significant challenges due to the diverse ways in which users express queries, such as colloquial language and ambiguous terms. These issues hinder the effectiveness of Retrieval-Augmented Generation (RAG) systems in IP filed. In this paper, we propose a novel Multi-Angle Question Generation and Retrieval Fine-Tuning Method (MQG-RFM) that leverages large language models (LLMs) as agents to simulate diverse user queries. By generating multiple variations of queries and fine-tuning the retrieval model with hard negative mining, MQG-RFM improves the retrieval accuracy and answer generation quality in patent-related Q\&A scenarios. MQG-RFM offers a simple and generalizable solution that does not require complex architectural changes, making it an efficient and scalable method for personalized deployment in small and medium-sized IP agencies. Experimental results on a Taiwan patent Q\&A dataset show 185.62\% improvement in retrieval accuracy on the Patent Consultation dataset and 262.26\% improvement on the Novel Patent Technology Report dataset, with 14.22\% and 53.58\% improvements in generation quality,respectively, over the baselines.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: intellectual property; Retrieval Enhancement Generation；Large Language Model

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources

Languages Studied: Chinese

Submission Number: 7668

Loading