Keywords: model extraction, cryptanalytic extraction
TL;DR: significantly improve performance of high fidelity model extraction, importantly removing prior bottlenecks of the attack
Abstract: Deep neural networks, costly to train and rich in intellectual property value, are
increasingly threatened by model extraction attacks that compromise their confiden-
tiality. Previous attacks have succeeded in reverse-engineering model parameters
up to a precision of float64 for models trained on random data with at most three
hidden layers using cryptanalytical techniques. However, the process was identified
to be very time consuming and not feasible for larger and deeper models trained on
standard benchmarks. Our study evaluates the feasibility of parameter extraction
methods of Carlini et al. [1] further enhanced by Canales-Martínez et al. [2] for
models trained on standard benchmarks. We introduce a unified codebase that
integrates previous methods and reveal that computational tools can significantly
influence performance. We develop further optimisations to the end-to-end attack
and improve the efficiency of extracting weight signs by up to 14.8 times com-
pared to former methods through the identification of easier and harder to extract
neurons. Contrary to prior assumptions, we identify extraction of weights, not
extraction of weight signs, as the critical bottleneck. With our improvements, a
16,721 parameter model with 2 hidden layers trained on MNIST is extracted within
only 98 minutes compared to at least 150 minutes previously. Finally, addressing
methodological deficiencies observed in previous studies, we propose new ways of
robust benchmarking for future model extraction attacks.
Primary Area: Safety in machine learning
Submission Number: 19219
Loading