Abstract: Private inference applies cryptographic techniques like homomorphic encryption, garble circuit and secret sharing to keep both sides privacy in a client-server setting during inference. It is often hindered by the high communication overheads, especially at non-linear activation layers such as ReLU. Hence ReLU pruning has been widely recognized as an efficient way to accelerate private inference. Existing approaches to ReLU pruning typically rely on coarse hypothesis, which assume an inverse correlation between the importance of ReLU and linear layers or shallow activation layers have less importance for universal models, to assign the budgets according to the layer while preserving the inference accuracy. However, these assumptions are based on limited empirical evidence and can fail to generalize to diverse model architectures. In this work, we introduce a finer-granularity ReLU budget assignment approach by assessing the layer-wise importance of ReLU with the Shapley value.
To address the computational burden of exact Shapley value calculation, we propose a tree-trimming algorithm for fast estimation. We provide both theoretical guarantees and empirical validation of our method. Our extensive experiments show that we achieve better efficiency and accuracy than the state-of-the-art across diverse model architectures, activation functions, and datasets. Specifically, we only need $\sim$$2.5\times$ fewer ReLU operations to achieve a similar inference accuracy and gains up to $\sim$$8.13\%$ increase on inference accuracy with similar ReLU budgets.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Chuan_Guo1
Submission Number: 5047
Loading