Adaptive Query Scheduling in Key-Value Data Stores

Chen Xu, Mohamed A. Sharaf, Minqi Zhou, Aoying Zhou, Xiaofang Zhou

Published: 2013, Last Modified: 14 Oct 2024DASFAA (1) 2013EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Large-scale distributed systems such as Dynamo at Amazon, PNUTS at Yahoo!, and Cassandra at Facebook, are rapidly becoming the data management platform of choice for most web applications. Those key-value data stores rely on data partitioning and replication to achieve higher levels of availability and scalability. Such design choices typically exhibit a trade-off in which data freshness is sacrificed in favor of reduced access latencies. Hence, it is indispensable to optimize resource allocation in order to minimize: 1) query tardiness, i.e., maximize Quality of Service (QoS), and 2) data staleness, i.e., maximize Quality of Data (QoD). That trade-off between QoS and QoD is further manifested at the local-level (i.e., replica-level) and is primarily shaped by the resource allocation strategies deployed for managing the processing of foreground user queries and background system updates. To this end, we propose the AFIT scheduling strategy, which allows for selective data refreshing and integrates the benefits of SJF-based scheduling with an EDF-like policy. Our experiments demonstrate the effectiveness of our method, which does not only strike a fine trade-off between QoS and QoD but also automatically adapts to workload settings.