Abstract: Caching is a crucial solution to alleviate the high latency and low bandwidth of cloud databases. However, existing caching algorithms are not suitable for cloud databases as 1) they cannot ensure the adaptability to changing workloads; 2) they are not designed with awareness of data fetching costs. Combining learning-based models with cost-aware caching algorithms is natural for better performance. However, it is challenging due to the absence of the oracle algorithm for guiding the learning model. Moreover, current learning models incur significant computation overheads, potentially worsening the performance of cloud databases. In this paper, we propose a learning-based cost-aware caching framework called LBSC for cloud databases, ensuring faster query execution and robust performance in dynamic workloads. We first introduce an approximately optimal oracle algorithm called BeladySizeCost, which retains data items with high cost per byte that are likely to be accessed in near future. Then, we present a lightweight supervised learning-based model that learns from BeladySizeCost to predict the eviction probability of the cached data. Moreover, we design effective optimizations to reduce the computation overheads of the learning-based algorithm. Extensive experiments in both simulations and real-world cloud databases demonstrate that the proposed framework significantly outperforms the state-of-the-art baselines.
Loading