CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition

Zebin Wang; Menghan Lin; Bolin Shen; Ken Anderson; Molei Liu; Tianxi Cai; Yushun Dong

CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition

Zebin Wang, Menghan Lin, Bolin Shen, Ken Anderson, Molei Liu, Tianxi Cai, Yushun Dong

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This paper introduces CEGA, a novel Graph Neural Networks model extraction framework to efficiently select query nodes under strict budget constraints, exhibiting superior performance over alternatives.

Abstract: Graph Neural Networks (GNNs) have demonstrated remarkable utility across diverse applications, and their growing complexity has made Machine Learning as a Service (MLaaS) a viable platform for scalable deployment. However, this accessibility also exposes GNN to serious security threats, most notably model extraction attacks (MEAs), in which adversaries strategically query a deployed model to construct a high-fidelity replica. In this work, we evaluate the vulnerability of GNNs to MEAs and explore their potential for cost-effective model acquisition in non-adversarial research settings. Importantly, adaptive node querying strategies can also serve a critical role in research, particularly when labeling data is expensive or time-consuming. By selectively sampling informative nodes, researchers can train high-performing GNNs with minimal supervision, which is particularly valuable in domains such as biomedicine, where annotations often require expert input. To address this, we propose a node querying strategy tailored to a highly practical yet underexplored scenario, where bulk queries are prohibited, and only a limited set of initial nodes is available. Our approach iteratively refines the node selection mechanism over multiple learning cycles, leveraging historical feedback to improve extraction efficiency. Extensive experiments on benchmark graph datasets demonstrate our superiority over comparable baselines on accuracy, fidelity, and F1 score under strict query-size constraints. These results highlight both the susceptibility of deployed GNNs to extraction attacks and the promise of ethical, efficient GNN acquisition methods to support low-resource research environments. Our implementation is publicly available at [https://github.com/LabRAI/CEGA](https://github.com/LabRAI/CEGA).

Lay Summary: Graph Neural Networks (GNNs) are tools that help us understand complex data with a graph structure, such as how people are connected in a social network or how medicine participates in human metabolism. However, training these models usually requires a significant amount of computational resources and a large number of labeled samples, which can be costly or difficult to obtain. For convenience, many graph-based machine learning models are stored on online servers, raising concerns about their features being extracted by adversaries, which can cause significant financial and reputational loss. In our paper, we investigate strategies for extracting the functionality of graph-based models and evaluate their potential in promoting ethical model acquisition, as well as inspiring defensive structures against malicious model extraction. Specifically, we discussed a convenient situation where we could not ask too many questions to the server upfront or all at once. To help other researchers explore this highly practical yet less studied field, we have created **CEGA**, an easy-to-use tool that enables researchers to apply our approach to their graph learning tasks. With CEGA, people can train effective GNNs for their tasks by _learning_ from existing strong models using minimal supervision.

Link To Code: https://github.com/LabRAI/CEGA

Primary Area: Deep Learning->Graph Neural Networks

Keywords: GNN Model Extraction, Cost Efficiency, Budget Limitation

Submission Number: 13870

Loading