Keywords: fingerprinting, model sharing
Abstract: Large scale democratization of machine learning has made model sharing commonplace. This has also raised significant concerns around unauthorized usage, intellectual property violations, and model leakage. Model fingerprinting through memorization of fixed strings has emerged as a practical solution to address these challenges for LLMs. However, prior research on fingerprint robustness has largely overlooked realistic adversarial conditions, generally assuming that an adversary, unaware of fingerprint queries, cannot easily evade detection. We introduce a realistic adversarial threat model in which an attacker can uniformly modify the output distribution of an LLM, without degrading utility on benign inputs or requiring explicit knowledge of fingerprint queries to evade detection. Under this threat model, we present a novel family of sampling-based attacks capable of bypassing all existing fingerprinting schemes. To counteract these, we propose a new paradigm based on approximate fingerprint detection and memorization and provide concrete instantiations demonstrating their robustness and practicality. Our work highlights critical security vulnerabilities in current fingerprinting approaches and aims to encourage further research into robust fingerprinting methods resilient under realistic adversarial scenarios.
Submission Number: 84
Loading