Abstract: Nowadays, deep neural networks (DNNs) are ubiquitous within various smart mobile applications, e.g., Internet of Things (IoTs), emerging massive data in the end. Cloud-edge computing unleashes the potential of distributed DNN inference at the network edge by closely data processing to meet low-latency demands. However, it is challenging to optimally warm up runtime dependencies of DNNs on resource-limited edge servers, which incurs cold start with unpredictable cost to deteriorate the inference performance. In view of the above challenge, we propose an inference acceleration approach, named INAA, for boosting DNN cold start in cloud-edge computing. Specifically, the problem is modeled to reduce long-term inference time by jointly optimizing dependencies planning and resource allocation. First, the optimal computing and bandwidth resource management are obtained via convex optimization. Based on weighted congestion game, a distributed decision-making algorithm is developed. Finally, trace-driven comparative experiments validate that INAA outperforms existing approaches.
Loading