Abstract: Emerging mobile applications, such as cognitive assistance based on deep neural network (DNN), require low latency as well as high computation power. To meet these requirements, edge computing (also called fog computing) has been proposed, which offloads computations to edge servers located near mobile clients. This paradigm shift from cloud to edge requires new computing infrastructure where edge servers are pervasively distributed over a region. This paper presents PerDNN, a system that executes DNNs of mobile clients collaboratively with pervasive edge servers. PerDNN dynamically partitions DNN computation between a client and an edge server to minimize execution latency. It predicts the next edge server the client will visit, calculates a speculative partitioning plan, and transfers the server-side DNN layers to the predicted server in advance, which reduces the initialization overhead needed to start offloading, thus avoiding cold starts. We do not incur excessive network traffic between edge servers, though, by migrating only a tiny fraction of the server-side DNN layers with negligible performance loss. We also use GPU statistics of edge servers for DNN partitioning to deal with the resource contention caused by multi-client offloading. In the simulation with human trace datasets and execution profile of real hardware, PerDNN reduced the occurrence of cold starts by up to 90%, achieving 58% higher throughput when clients change their offloading servers, compared to a baseline without proactive DNN transmission.
0 Replies
Loading