Abstract: The large number of ReLU non-linearity operations in existing deep neural networks makes them ill-suited for latency-efficient private inference (PI). Existing techniques to reduce ReLU operations often involve manual effort and sacrifice significant accuracy. In this paper, we first present a novel measure of
non-linearity layers’ ReLU sensitivity, enabling mitigation of the time-consuming
manual efforts in identifying the same. Based on this sensitivity, we then present
SENet, a three-stage training method that for a given ReLU budget, automatically
assigns per-layer ReLU counts, decides the ReLU locations for each layer’s activation map, and trains a model with significantly fewer ReLUs to potentially yield
latency and communication efficient PI. Experimental evaluations with multiple
models on various datasets show SENet’s superior performance both in terms of
reduced ReLUs and improved classification accuracy compared to existing alternatives. In particular, SENet can yield models that require up to ∼2× fewer ReLUs while yielding similar accuracy. For a similar ReLU budget SENet can yield
models with ∼2.32% improved classification accuracy, evaluated on CIFAR-100.
0 Replies
Loading