Abstract: With the deep penetration of mobile devices, more and more mobile deep learning applications have been widely used in daily life. However, since deep learning tasks are computationally intensive, the limited computation resource on mobile devices cannot execute the application effectively. The common approaches are transmitting the data from mobile devices and offloading the computation to the cloud. This brings another issue that the high data transmission delay may become the bottleneck of the performance. In this paper, we explore a new rising concept, edge computing, into mobile deep learning applications. Comparing with cloud computing, the communication delay can be significantly reduced. To this end, we note that there exists a layer-level partitioning strategy for deep neural networks to distribute the computation loads more smoothly among the device, the edge and the cloud, which can further reduce the overall execution delay. We propose a framework called DeePar which exploits all the available resources from the device, the edge server, and the cloud to collaboratively optimize the inference performance. We also formulate a scheduling problem for the multi-task execution and propose an efficient solution. Both our prototype experiments and extensive simulations show that DeePar can achieve up to 80% delay reduction.
0 Replies
Loading