Abstract: EMR (Elastic Map Reduce) is a service provided by mainstream cloud vendors for data processing users to directly obtain well-managed Hadoop YARN clusters on the cloud. Preemptible instance is a kind of cloud server that is cheap but is likely to be reclaimed by cloud vendors suddenly. Running EMR clusters on preemptible instances relies on YARN's own fault-tolerance, which is limited. In this paper, we present PrTaurus as an availability-enhanced EMR service on preemptible instances. PrTaurus integrates a system-level checkpoint capability based on Docker into YARN to further improve its fault-tolerance. In addition, PrTaurus's scheduling strategy takes advantage of Alibaba Cloud's one-hour protection policy. Furthermore, a new method that comprehensively considers cost-efficiency, preemption risk and overhead is proposed to select cluster instances. We evaluated PrTaurus through simulations on real-world workload and instance price traces. Experimental results show that compared with the existing EMR clusters running on preemptible instances, PrTaurus significantly reduces cost (13.0%-74.6%), instance preemptions (60.3%-88.9%), and task preemptions (86.0 % - 98.6 %).
0 Replies
Loading