Keywords: Reasoning Protect
Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in solving complex problems through step-by-step reasoning. Recent studies have reported that this reasoning ability can be transferred to small language models (SLMs) by fine-tuning them with rationales generated by LLMs. Considering that training LLMs requires large amounts of data and computational resources and entails substantial value, this transferability raises significant concerns regarding intellectual property protection. Malicious users may exploit these APIs by querying them to obtain high-quality responses, which can then be used to enhance the reasoning capabilities of their own models. Such illegal practices undermine the intellectual property associated with the reasoning ability of the original models. In this paper, we investigate how to prevent the transfer of reasoning abilities in such a query-based "stealing" process. Our approach focuses on manipulating the outputs of LLMs while ensuring that legitimate users can still access these outputs without disruption. To achieve this, we propose a Unnoticeable Reasoning Editing(UREdit) that embeds imperceptible characters into the LLMs outputs, thereby preventing the transfer of reasoning ability. Furthermore, given that the queries are streamed sequentially through LLMs, we propose a Sample- and Token-Level Selection to improve the imperceptibility of the edits. Extensive experiments validate the effectiveness of our method across various datasets and models, and further analyze for why our method works.
Paper Type: Long
Research Area: Discourse, Pragmatics, and Reasoning
Research Area Keywords: Safety, Reasoning
Languages Studied: English
Submission Number: 2676
Loading