Abstract: With the growing increase of data processing and Hadoop data center construction requirements, simulations of large scale Hadoop data centers with high precision are becoming a great challenge. In this paper, a new simulator which integrates baseline test and multi-layered network model is introduced and implemented. With baseline test, we can predict precisely the execution time for each MapReduce task. The network model considers the complexity of data center network and it can provide more accurate prediction of data transfer delay. The experimental test is implemented in a large scale data center to evaluate the performance of our simulator. Three experimental environments with 35 nodes, 47 nodes and 80 nodes are configured in the data center, and Terasort, Wordcount and Hive are selected as benchmarks with maximum 100 TB input data. The experiments show that the error comparing between the simulator results and experimental environment results in most cases is less than 10%. The comparison with YARNsim confirms that our simulator is capable to achieve precise simulation for a large scale data center and the baseline test model is suitable to optimize the execution time simulation for any type of Hadoop applications.
Loading