Abstract: While machine learning as a service (MLaaS) enables users to leverage powerful pre-trained models at low cost, it also poses significant intellectual property risks for model builders. A widely adopted defense mechanism is to embed ownership credentials (e.g., watermarking information) into the model, allowing the verifier to examine them during ownership verification. Despite the effectiveness of such watermark-based schemes, we identify a critical vulnerability, termed the model defamation attack (MDA). If an adversary compromises a verifier and obtains the submitted credential, it can reuse this stolen information by embedding it into a malicious model $\hat{E}$ and falsely attribute its ownership to the original model builder, thereby damaging their reputation. This article presents a generic anti-MDA model certification (GAMC) framework that can be seamlessly integrated with existing watermarking schemes to enhance their robustness. We identify two design goals: (1) credential confidentiality, which prevents sensitive watermark-related information from being leaked during verification; and (2) credential non-reusability, which prevents the reuse of any previously submitted credentials. To this end, we employ a cryptographic accumulator to construct the model certification mechanism and design a customized $\Sigma _{{{OR}}}$ protocol as a complement. We formally prove the security of our anti-MDA framework and conduct extensive experiments across various watermarking schemes and neural network architectures to evaluate its compatibility and effectiveness. The experimental results show that GAMC improves robustness against model defamation attacks by 86.67% with negligible overhead, demonstrating its practicality for real-world deployment.
External IDs:dblp:journals/tdsc/HaoLHH25
Loading