Stealix: Model Stealing via Prompt Evolution

Zhixiong Zhuang; Hui-Po Wang; Maria-Irina Nicolae; Mario Fritz

Stealix: Model Stealing via Prompt Evolution

Zhixiong Zhuang, Hui-Po Wang, Maria-Irina Nicolae, Mario Fritz

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

TL;DR: Stealix is the first model stealing attack leveraging diffusion models against image classification models without relying on human-crafted prompts.

Abstract: Model stealing poses a significant security risk in machine learning by enabling attackers to replicate a black-box model without access to its training data, thus jeopardizing intellectual property and exposing sensitive information. Recent methods that use pre-trained diffusion models for data synthesis improve efficiency and performance but rely heavily on manually crafted prompts, limiting automation and scalability, especially for attackers with little expertise. To assess the risks posed by open-source pre-trained models, we propose a more realistic threat model that eliminates the need for prompt design skills or knowledge of class names. In this context, we introduce Stealix, the first approach to perform model stealing without predefined prompts. Stealix uses two open-source pre-trained models to infer the victim model’s data distribution, and iteratively refines prompts through a genetic algorithm, progressively improving the precision and diversity of synthetic images. Our experimental results demonstrate that Stealix significantly outperforms other methods, even those with access to class names or fine-grained prompts, while operating under the same query budget. These findings highlight the scalability of our approach and suggest that the risks posed by pre-trained generative models in model stealing may be greater than previously recognized.

Lay Summary: Machine learning models are valuable intellectual property, but they can be stolen by simply querying them and using the outputs to train a similar model. Prior attacks use image generators with hand-crafted prompts, assuming attackers have prompt expertise. This simplifies the threat and overlooks how easily models can be copied. Our work shows that even attackers with no prompt knowledge can effectively copy a model, raising the alarm about how easy model theft has become. We introduce Stealix, the first fully automated model stealing method that begins with just one real image per class. Stealix evolves text prompts that generate synthetic images, which the target model labels as belonging to the correct class. Without any human-crafted prompts or knowledge of class names, Stealix creates training data that mimics the target model’s data and leads to stronger stolen models than prior methods. Our findings show that open-source image generators greatly reduce the effort needed for model theft. This raises concerns about their misuse and calls for more responsible deployment and stronger defenses against emerging model stealing threats.

Link To Code: https://zhixiongzh.github.io/stealix/

Primary Area: Social Aspects->Security

Keywords: model stealing, security, genetic algorithm, prompt optimization

Submission Number: 2811

Loading