QoS Awareness and Improved Throughput of Point Cloud Services With Dynamic Workloads

Kaihua Fu, Jiuchen Shi, Yao Chen, Quan Chen, Weng-Fai Wong, Wei Wang, Bingsheng He, Minyi Guo

Published: 01 Mar 2026, Last Modified: 14 Mar 2026IEEE Transactions on ComputersEveryoneRevisionsCC BY-SA 4.0
Abstract: Deep learning on 3D point clouds plays a vital role in a wide range of applications such as AR/VR visualization, 3D cloth virtual try-on, and game rendering. As some applications require low latency, the point cloud services are also deployed on datacenter with powerful GPUs. While the queries of point cloud services show various workload change patterns due to different degrees of sparsity, current batching-based serving schemes result in either long latency or low throughput. We propose a scheme called Volans to address the above challenges and effectively support point cloud services. Volans comprises a workload predictor, a topology deployer, and a progress-aware scheduler. The predictor grids the input query and estimates the workload changes. Afterward, the deployer splits the model into several stages and determines the batch size for each stage based on the workload changes. The scheduler reduces the QoS violation when queries run slower due to unpredicted workload spikes. Experiments show that Volans enhances the peak supported throughput by up to 31.1% while maintaining the required 99%-ile latencies compared to state-of-the-art techniques.
Loading