Abstract: The pervasive integration of deep neural networks (DNNs) within smart devices has significantly increased compu-tational workloads, consequently intensifying pressure on real-time performance and device power consumption. Offloading segments of DNNs to the edge has emerged as an effective strategy for reducing latency and device power usage. Nonethe-less, determining the workload to offload presents a complex challenge, particularly in the face of fluctuating device workloads and varying wireless signal strengths. This paper introduces a streamlined approach aimed at swiftly and accurately forecasting the computing latency of a DNN. Building upon this, an adaptive neurosurgeon framework is proposed to dynamically select the optimal partition point of a DNN during runtime, effectively minimizing computing latency. Through experimental validation, our proposed adaptive neurosurgeon demonstrates superior per-formance in reducing computing latency amidst changing DNN workloads across devices and varying wireless communication capabilities, outperforming existing state-of-the-art approaches, such as the autodidactic neurosurgeon.
Loading