TRIDENT: An Efficient Data-Free Model Extraction Attack for Graph Neural Networks

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Graph Neural Networks, Model Extraction Attack, Machine Learning as a Service
Abstract: Graph Neural Networks (GNNs) are increasingly offered as Machine-Learning-as-a-Service (MLaaS) via APIs. However, this deployment model exposes them to model extraction attacks (MEAs), where adversaries aim to reconstruct or steal the proprietary GNN models by leveraging only query access to the service. Although recent work has developed data-free MEAs that have performed well in both transductive and inductive settings, the field still lacks a unified theoretical account explaining when and why such attacks succeed, and how an adversary should schedule queries under strict budget constraints. In this paper, we aim to bridge this gap by introducing the first theory-driven framework for MEAs against GNNs. This framework highlights that not all queries are equally effective, guiding more strategic, budget-constrained query scheduling. It also effectively leverages the surrogate model’s white-box accessibility to improve its alignment with the black-box victim. To specifically answer the key question of what kinds of queries enable effective MEAs, we formalize the extraction risk and derive a bound based on the generalization discrepancy between the query distribution and the victim model's unseen training distribution. Guided by this analysis, we propose TRIDENT, which strategically schedules queries, particularly under strict budget constraints. Extensive experiments on six real-world benchmarks and three GNN backbones show that our method achieves state-of-the-art performance. These results validate both the theoretical contributions and the practical efficiency of our approach.
Supplementary Material: pdf
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 6728
Loading