Toward Trustworthy: A Method for Detecting Fine-Tuning Origins in LLMs

Hongyu Liang; Yihan Li; Yuting Zheng; Yiran Zhang; Hanbo Huang; Lin Liu; Shiyu Liang

Toward Trustworthy: A Method for Detecting Fine-Tuning Origins in LLMs

Hongyu Liang, Yihan Li, Yuting Zheng, Yiran Zhang, Hanbo Huang, Lin Liu, Shiyu Liang

16 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Fine-Tuning Origins Detection, LoRA, LLM

Abstract: As large language models (LLMs) continue to advance, their deployment often involves fine-tuning to enhance performance on specific downstream tasks. However, this customization is sometimes accompanied by misleading claims about the origins, raising significant concerns about transparency and trust within the open-source community. Existing model verification techniques typically assess functional, representational, and weight similarities. However, these approaches often struggle against obfuscation techniques, such as permutations and scaling transformations, that obscure a model's lineage. To address this limitation, we propose a novel detection method that rigorously determines whether a model has been fine-tuned from a specified base model. This method includes the ability to extract the LoRA rank utilized during the fine-tuning process, providing a more robust verification framework. This framework is the first to provide a formalized approach specifically aimed at pinpointing the sources of model fine-tuning. We empirically validated our method on twenty-nine diverse open-source models under conditions that simulate real-world obfuscation scenarios. We empirically analyze the effectiveness of our framework and finally, discuss its limitations. The results demonstrate the effectiveness of our approach and indicate its potential to establish new benchmarks for model verification.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1054

Loading