Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference Services

Zhipeng Cheng, Xiaoyu Xia, Hong Wang, Minghui Liwang, Ning Chen, Xuwei Fan, Xianbin Wang

Published: 01 Sept 2025, Last Modified: 21 Jan 2026IEEE Transactions on Services ComputingEveryoneRevisionsCC BY-SA 4.0

Abstract: Edge inference (EI) has emerged as a promising paradigm to address the growing limitations of cloud-based Deep Neural Network (DNN) inference services, such as high response latency, limited scalability, and severe data privacy exposure. However, deploying DNN models on resource-constrained edge devices introduces additional challenges, including limited computation/storage resources, dynamic service demands, and heightened privacy risks. To address these challenges, this paper presents a novel privacy-aware optimization framework that jointly tackles DNN model deployment, user-server association, and model partitioning, aiming to minimize long-term average inference delay under resource and privacy constraints. We formulate the problem as a complex, NP-hard stochastic optimization problem, emphasizing its inherent complexity. To efficiently handle system dynamics and computational complexity, we adopt a Lyapunov-based approach to transform the long-term objective into tractable per-slot decisions. Additionally, we introduce a coalition formation game model to facilitate adaptive user-server association, and design a greedy algorithm for model deployment within each coalition. Extensive simulations demonstrate that the proposed algorithm significantly reduces inference delay while consistently satisfying privacy constraints, outperforming baselines across diverse scenarios.

External IDs:doi:10.1109/tsc.2025.3607117