Edge AI as a Service: Configurable Model Deployment and Delay-Energy Optimization With Result Quality Constraints

Wenyu Zhang, Sherali Zeadally, Wei Li, Haijun Zhang, Jingyi Hou, Victor C. M. Leung

Published: 01 Jan 2023, Last Modified: 24 Jul 2025IEEE Trans. Cloud Comput. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The breakthrough of artificial intelligence (AI) techniques has accelerated their applications in a wide range of industries, such as security protection, transportation, agriculture, and medical care. With the support of edge computing environments, providing latency guaranteed AI as a Service (AIaaS) can accelerate the deployment of data-intensive and computation-intensive AI applications and reduce the investment cost of the customers. However, the deployment architecture and working mechanism design, and performance optimization problems specific for AIaaS with configurable data quality and model complexity have not been studied in existing works. To address the problem, we propose a configurable model deployment architecture (CMDA) for edge AIaaS and present a flexible working mechanism by enabling the joint configuration of data quality ratios (DQRs) and model complexity ratios (MCRs) for the AI tasks. Along with commonly used resource allocation operations, the manager can improve the energy and delay performance of AI services with the desired quality of results (QoRs). We develop an energy-delay minimization problem under the framework of CMDA and propose a polynomial regression based relaxing method to solve the task configuration subproblem. We conduct experiments and simulations on the ImageNet classification and the common objects in context (COCO) object detection tasks using state-of-the-art deep learning models. We present the corresponding result quality tables (RQTs) and QoR regression models to illustrate the proposed method. The results of single task configuration and multi-task configuration and resource allocation on ImageNet classification and COCO object detection tasks demonstrate that the proposed method can achieve over $5\times$ HDEC improvement compared with non-optimization schemes, and also show that joint configuration of DQR and MCR can achieve over $1.2\times$ HDEC improvement compared with the methods that only configure DQR or MCR.