RSCache: A Tail Latency Friendly Cache Based on NVMe SSDs

Jincheng Lu, Miao Cai, Baoliu Ye

Published: 01 Jan 2024, Last Modified: 25 Jul 2025ISPA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: High fan-out requests are prevalent in systems employing multi-tier architectures. These requests are divided into several sub-requests for parallel processing. However, a high fan-out request must await all sub-requests to be completed before returning, but the processing times of sub-requests are unpredictable due to their differences in characteristics, such as data volume and data popularity. Meanwhile, existing SSD-based caches struggle to adjust request processing speeds to ensure timely handling. As a result, some sub-requests are delayed, affecting the overall latency and causing long tail latency issues.This paper proposes RSCache, a tail latency-friendly cache based on NVMe SSDs. RSCache combines the NVMe Weighted Round-robin (WRR) arbitration mechanism with a priority-based scheduling mechanism to enable differentiated request processing. We propose a fan-out size-based priority assignment strategy along with a latency-aware sub-request sorting method. They collaborate to prioritize sub-requests according to their impacts on tail latency and schedule them to NVMe priority queues with various processing speeds, effectively reducing the processing time variations among sub-requests. In addition, we balance the load across queues to avoid congestion with a novel feedback mechanism. We implement the prototype of RSCache based on SPDK. Our experiments demonstrate that RSCache reduces tail latency by up to 40% and improves throughput by two times compared to state-of-the-art SSD-based cache designs.