Abstract: With the fast growth of mobile edge computing (MEC), the deep neural network (DNN) has gained more opportunities in application to various mobile services. Given the tremendous number of learning parameters and large model size, the DNN model is often trained in cloud center and then dispatched to end devices for inference via edge network. Therefore, maximizing the cost-efficiency of learned model dispatch in the edge network would be a critical problem for the model serving in various application contexts. To reach this goal, in this article we focus mainly on reducing the total model dispatch cost in the edge network while maintaining the efficiency of the model inference. We first study this problem in its off-line form as a baseline where a sequence of $n$ requests can be pre-defined in advance and exploit dynamic programming techniques to obtain a fast optimal algorithm in time complexity of $O(m^{2}n)$ under a semi-homogeneous cost model in a $m$-sized network. Then, we design and implement a 2.5-competitive algorithm for its online case with a provable lower bound of 2 for any deterministic online algorithm. We verify our results through careful algorithmic analysis and validate their actual performance via a trace-based study based on a public open international mobile network dataset.
Loading