Efficient Edge Inference by Selective Query

Anil Kag; Igor Fedorov; Aditya Gangrade; Paul Whatmough; Venkatesh Saligrama

Efficient Edge Inference by Selective Query

Anil Kag, Igor Fedorov, Aditya Gangrade, Paul Whatmough, Venkatesh Saligrama

Published: 01 Feb 2023, Last Modified: 01 Mar 2023ICLR 2023 posterReaders: Everyone

Keywords: efficient edge inference, low-capacity model, large scale prediction, dynamic neural networks, adaptive neural networks

TL;DR: Low-complexity model(edge) performs poorly on large-scale tasks; For efficient inference, it must learn to identify examples that benefit by querying; it has to identify both hard-to-classify examples and those that the cloud model would misclassify.

Abstract: Edge devices provide inference on predictive tasks to many end-users. However, deploying deep neural networks that achieve state-of-the-art accuracy on these devices is infeasible due to edge resource constraints. Nevertheless, cloud-only processing, the de-facto standard, is also problematic, since uploading large amounts of data imposes severe communication bottlenecks. We propose a novel end-to-end hybrid learning framework that allows the edge to selectively query only those hard examples that the cloud can classify correctly. Our framework optimizes over neural architectures and trains edge predictors and routing models so that the overall accuracy remains high while minimizing the overall latency. Training a hybrid learner is difficult since we lack annotations of hard edge-examples. We introduce a novel proxy supervision in this context and show that our method adapts seamlessly and near optimally across different latency regimes. On the ImageNet dataset, our proposed method deployed on a micro-controller unit exhibits $25\%$ reduction in latency compared to cloud-only processing while suffering no excess loss.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Supplementary Material: zip

20 Replies

Loading