Identification of critical parameters for MapReduce energy efficiency using statistical Design of Experiments
Abstract: Energy efficiency is an important concern for data centers today. Most of these data centers use MapReduce frameworks for big data processing. These frameworks and modern hardware provide the flexibility in form of parameters to manage the performance and energy consumption of system. However tuning these parameters such that it reduces energy consumption without impacting performance is challenging since - 1) there are a large number of parameters across the layers of frameworks, 2) impact of the parameters differ based on the workload characteristics, 3) the same parameter may have conflicting impacts on performance and energy and 4) parameters may have interaction effects. To streamline the parameter tuning, we present the systematic design of experiments to study the effects of different parameters on performance and energy consumption with a view to identify the most influential ones quickly and efficiently. The final goal is to use the identified parameters to build predictive models for tuning the environment. We perform a detailed analysis of the main and interaction effects of rationally selected parameters on performance and energy consumption for typical MapReduce workloads. Based on a relatively small number of experiments, we ascertain that replication-factor has highest impact and, surprisingly compression has least impact on the energy efficiency of MapReduce systems. Furthermore, from the results of factorial design we infer that the two-way interactions between block-size, Map-slots, and CPU-frequency, parameters of Hadoop platform have a high impact on energy efficiency of all types of workloads due to the distributed, parallel, pipe-lined design.
Loading