AIQoSer: Building the efficient Inference-QoS for AI Services

Jianxin Li, Tianchen Zhu, Haoyi Zhou, Qingyun Sun, Chunyang Jiang, Shuai Zhang, Chunming Hu

2022 (modified: 16 Nov 2022)IWQoS 2022Readers: Everyone

Abstract: The AI inspired methods have entirely changed the network QoS landscape and brought better demand-guided experiences for the end-users. However, the increasing demands of satisfactory experiences require larger AI models, whose inference efficiency becomes the non-negligible drawback in the time-sensitive network QoS. In this work, we defined this challenge as the inference-QoS (iQoS) problem of the network QoS itself, which balances inference efficiency and performance for AI services. We design a unified iQoS metric to evaluate the AI-enhanced QoS frameworks with considerations on model performance, inference latency, and input scale. Then, we propose a two-stage pipeline as the exemplar for leveraging the iQoS metric in QoS-aware AI services: (i) enhance reconstruction ability, pretraining masked autoencoder extracts intrinsic data correlations by multi-scale masking; (ii) improve inference efficiency, forecasting masked decoder uses the data scale pruning in terms of spatial and temporal dimension for prediction. Comprehensive experiments on our method demonstrate its superior inference latency and overwhelming traffic matrix prediction performance.

0 Replies