Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference Services

Zhipeng Cheng, Xiaoyu Xia, Hong Wang, Minghui Liwang, Ning Chen, Xuwei Fan, Xianbin Wang

Published: 01 Sept 2025, Last Modified: 21 Jan 2026IEEE Transactions on Services ComputingEveryoneRevisionsCC BY-SA 4.0
Abstract: Edge inference (EI) has emerged as a promising paradigm to address the growing limitations of cloud-based Deep Neural Network (DNN) inference services, such as high response latency, limited scalability, and severe data privacy exposure. However, deploying DNN models on resource-constrained edge devices introduces additional challenges, including limited computation/storage resources, dynamic service demands, and heightened privacy risks. To address these challenges, this paper presents a novel privacy-aware optimization framework that jointly tackles DNN model deployment, user-server association, and model partitioning, aiming to minimize long-term average inference delay under resource and privacy constraints. We formulate the problem as a complex, NP-hard stochastic optimization problem, emphasizing its inherent complexity. To efficiently handle system dynamics and computational complexity, we adopt a Lyapunov-based approach to transform the long-term objective into tractable per-slot decisions. Additionally, we introduce a coalition formation game model to facilitate adaptive user-server association, and design a greedy algorithm for model deployment within each coalition. Extensive simulations demonstrate that the proposed algorithm significantly reduces inference delay while consistently satisfying privacy constraints, outperforming baselines across diverse scenarios.
Loading