Keywords: Machine Learning, Machine Learning Systems, Function as a Service (FaaS), Machine Learning as a Service (MLaaS), Infrastructure as a Service (IaaS), Edge Computing, Model Decomposition, Model Quantization, Knowledge Distillation, Early-Exit / Internal Classifiers, Autoencoders, Federated Learning, Split-Federated Learning (SplitFed), Privacy-Preserving Training
Abstract: Edge intelligent applications like VR/AR and surveillance have become popular with the growth of IoT and mobile devices.
However, edge devices with limited capacity struggle to serve increasingly large and complex deep learning (DL) models.
To mitigate such challenges, researchers have proposed optimizing and offloading partitions of DL models among user devices, edge servers, and the cloud.
In this setting, users can take advantage of different services to support their intelligent applications.
For example, edge resources offer low response latency.
In contrast, cloud platforms provide low monetary cost computation resources for computation-intensive workloads.
However, communication between DL model partitions can introduce transmission bottlenecks and pose risks of data leakage.
Recent research aims to balance accuracy, computation delay, transmission delay, and privacy concerns.
They address these issues with model compression, model distillation, transmission compression, and model architecture adaptations, including internal classifiers.
This survey develops a systematic evaluation approach for state-of-the-art model offloading methods and model adaptation techniques.
We formulate an optimization problem for edge Deep Neural Network offloading that optimizes inference and training latency, data privacy, and resource monetary cost.
Area: Systems for ML and ML for systems
Type: Systemization of Knowledge (SoK)
Conflicts: None
Potential Reviewers: Amir H. Payberah, Mohammad Shahrad, Pooyan Jamshidi, Tianyin Xu,
Revision: No
Contact Email: zhangzs@bu.edu
Submission Number: 1
Loading