Abstract: In large-scale data centers, many cloud applications with stringent latency requirements exhibit partition-aggregate patterns. Individual jobs necessitate responses from thousands of software services, thereby demanding that the tail latency of each participating task be maintained within tens to hundreds of microseconds to ensure rapid response to user operations. To meet this requirement, researchers have proposed a series of innovative microsecond-level task scheduling algorithms. However, they primarily focus on intra-server scheduling, overlooking the influence of inter-server scheduling. Furthermore, existing algorithms fail to account for the inconsistency in the execution order of different jobs’ tasks on multiple servers. This inconsistency can lead to delayed completion of jobs and delayed response to users. To solve these problems, this paper proposes a two-level consistent low-latency scheduling algorithm CLLSched for microsecond-level tasks, aiming to achieve both low tail completion time and high CPU utilization on servers. CLLSched employs a server selection strategy based on the power-of-k-choice principle. Within each server, CLLSched implements a fine-grained dynamic core allocation strategy based on task types. According to simulation results, compared to the most advanced counterpart RackSched, CLLSched achieves a 2.78x reduction in tail latency for short tasks. Additionally, CLLSched improves CPU utilization by 1.14 times and achieves a 1.23x increase in cluster throughput. More importantly, it substantially reduces job completion times with a minimum reduction of 60%.
Loading