Defending against Model Extraction for GNNs with Model Reprogramming

Defending against Model Extraction for GNNs with Model Reprogramming

ICLR 2026 Conference Submission21053 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model Extraction; Graph Neural Networks; Trustworthy Machine Learning

TL;DR: We defend GNNs against model extraction by reprogramming them with graph-structure disturbances and layer-wise noise, blocking attackers while preserving utility and improving efficiency over prior defenses.

Abstract: The goal of model extraction (ME) on Graph Neural Networks (GNNs) is to steal the functionality of GNN models. Defense against extracting GNN models faces several challenges: (1) existing defense primarily designed for defense against convolutional neural networks without considering the graph structure of GNNs; (2) watermarked-based defense is typically passive without preventing model extraction from happening and can only identify a model stealing after extraction has occurred; (3) they either require entirely defensive training from scratch or expensive computation during inference. To address these limitations, we propose an effective defense method that can reprogram the model with graph structure-based and layer-wise noise to prevent ME for GNNs while maintaining model utility. Specifically, we reprogram the target model to: (1) introduce graph structure-based disturbances that prevent the attacker from fully learning its functionality; (2) incorporate data-specific, layer-wise noise into the target model to enhance defense while maintaining utility. Therefore, we can prevent the attacker from extracting the reprogrammed target model and preserve the model's utility with improved inference efficiency. Extensive experiments and analysis on defending against both hard-label and soft-label ME for GNNs demonstrate that our strategy can lessen the effectiveness of existing attack strategies while maintaining the model utility of the target model for benign queries.

Primary Area: learning on graphs and other geometries & topologies

Submission Number: 21053

Loading