TSQP: Safeguarding Real-Time Inference for Quantization Neural Networks on Edge Devices

Yu Sun, Gaojian Xiong, Jianhua Liu, Zheng Liu, Jian Cui

Published: 07 Mar 2025, Last Modified: 12 May 20252025 IEEE Symposium on Security and Privacy (SP)EveryoneRevisionsCC BY-NC 4.0

Abstract: Quantization Neural Networks (QNNs) has been widely adopted in resource-constrained edge devices due to their real-time capabilities and low resource requirement. However, concerns have arisen regarding that deployed models are white-box available to model thefts. To address this issue, TEE-shielded secure inference has been introduced as a secure and efficient solution. Nevertheless, existing methods neglect the compatibility with 8-bit quantized computation, which leads to severe integer overflow issue during inference. This issue could result a disastrous degradation in QNNs (to random guessing level), completely destroying model utility. Moreover, the model confidentiality and inference integrity also face a substantial threat due to the limited data representation space. To safeguard accurate and efficient inference for QNNs, TEE-Shielded QNN Partition (TSQP) are proposed, which presents three key insights: Firstly, Quantization Manager is designed to convert white-box inference to black-box by shielding critical scales in TEE. Additionally, overflow concerns are effectively addressed using reduced-range approaches. Secondly, by leveraging the Information Bottleneck theory to enhance model training, we introduce Parameter De-Similarity to defend against powerful Model Stealing attacks that existing methods are vulnerable to. Thirdly, the Integrity Monitor is suggested to detect inference integrity breaches in an oblivious manner. In contrast, existing method can be bypassed due to the lack of obliviousness. Experimental results demonstrate that proposed TSQP maintains high accuracy and achieves accurate integrity breaches detection. Our method achieves more than 8x speedup compared to full TEE inference, while reducing Model Stealing attacks accuracy from 3.99x to 1.29x. To our best knowledge, proposed method is the first TEE-shielded secure inference solution that achieves model confidentiality, inference integrity and model utility on QNNs.