Efficient Microsecond-scale Blind Scheduling with Tiny Quanta

Zhihong Luo, Sam Son, Dev Bali, Emmanuel Amaro, Amy Ousterhout, Sylvia Ratnasamy, Scott Shenker

Published: 2024, Last Modified: 03 Mar 2025ASPLOS (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: A longstanding performance challenge in datacenter-based applications is how to efficiently handle incoming client requests that spawn many very short (μs scale) jobs that must be handled with high throughput and low tail latency. When no assumptions are made about the duration of individual jobs, or even about the distribution of their durations, this requires blind scheduling with frequent and efficient preemption, which is not scalably supported for μs-level tasks. We present Tiny Quanta (TQ), a system that enables efficient blind scheduling of μs-level workloads. TQ performs fine-grained preemptive scheduling and does so with high performance via a novel combination of two mechanisms: forced multitasking and two-level scheduling. Evaluations with a wide variety of μs-level workloads show that TQ achieves low tail latency while sustaining 1.2x to 6.8x the throughput of prior blind scheduling systems.