Abstract: Machine Learning as a Service (MLaaS) is an innovative framework that enables a broad range of users to capitalize on the powerful Artificial Intelligence (AI) technologies. Nevertheless, MLaaS raises a privacy concern for both the client data and server model. To address this issue, several Secure Inference (SI) frameworks for MLaaS have been proposed in the literature that take advantage of Homomorphic Encryption (HE) operations. However, the computation cost of these frameworks is still high, especially for real-time applications. In this paper, we propose a novel system called input structure optimization for neural networks (TILE) to accelerate SI. The goal of TILE is to reduce both linear and non-linear computation costs, as well as non-linear communication costs in MLaaS, while maintaining the model accuracy. TILE defines two novel HE-friendly input structures: Internal Tile and External Tile Structures, aimed at reducing the HE operations for SI. We also develop a search mechanism to identify optimal application locations for these input structures. We apply TILE to widely used models such as VGG and ResNet, and datasets including Cifar10 and Tiny-ImageNet. The experimental results demonstrate that TILE effectively reduces the computation time, with up to 51.57% reduction for a state-of-the-art SI framework. Furthermore, TILE can also be applied to models that have already been pruned to significantly reduce the computation time, to further reduce the overall computation time by 25.90%.
Loading