Abstract: The MapReduce framework and its open source implementation Hadoop have established themselves as one of the most polular large data sets analyzers. They are widely used by many cloud service providers such as Amazon EC2 Cloud. However, while latency-sensitive applications becoming more and more important, Hadoop system shows its shortcoming in ensuring jobs completed on time. And currently, user has to provide a metric to evaluate the performance of different clients. Motivated by this, we proposed an algorithm CP-Scheduler (CPS) which uses a optimizer to analyze the best schedule in order to minimize the number of delayed jobs. Otherwise, as Hadoop System is not good at heterogeneous computing, our algorithm can also adapt different remote machines. These two features make it having better efficiency than the scheduler in Hadoop. The proposed algorithm is initially evaluated by a simulator which is designed for Hadoop. Experimental results show that the number of missing deadline jobs decrease by 60 percent on average in different sizes of situations.
Loading