Abstract: Serverless Edge Computing (SEC) has gained widespread adoption in improving resource utilization due to its triggered event-driven model. However, deploying deep neural network (DNN) inference services directly in SEC leads to resource inefficiencies, which stem from two key factors. First, existing methods adopt model-wise function encapsulation, which requires the entire DNN model to occupy memory throughout its execution lifecycle. This increases both memory footprint and occupancy time. Second, uniform DNN inference for diversity input leads to redundant computations and additional inference time. To this end, we propose REDI, a novel framework that leverages fine-grained block-wise function encapsulation and progressive inference to provide resource-efficient DNN inference while ensuring latency requirements. REDI enables the release of memory from already inferred shallow networks and allows each request to exit early based on input data complexity, eliminating redundant computations. To fully unleash the potential, REDI jointly considers resource heterogeneity, data diversity, and environment dynamics to investigate the block-wise function placement problem. We introduce an uncertainty-aware online learning-driven algorithm with bounded regret. Finally, we conduct extensive trace-driven experiments to evaluate our methods, demonstrating that REDI achieves a significant speedup of up to $6.52\times$ in terms of resource usage cost compared to state-of-the-art methods.
External IDs:dblp:journals/tmc/GuoDSHZ25
Loading