Abstract: To support various application scenarios, big data processing frameworks (BDPFs) such as Spark usually provide users with a large number of performance-critical configuration parameters. Since manually configuring is both labor-intensive and time-consuming, automatically tuning configurations parameters for BDPFs to achieve better performance has been an urgent need. To simultaneously address the corresponding challenges such as high dimensional configuration space, we propose ATConf-a new black-box approach of automatically tuning the internal and external configuration parameters for BDPFs. Experimental results based on our local distributed Spark cluster show that the best execution time achieved by ATConf is as much as 46.52% less than the default configuration. Besides, compared with the four baselines, ATConf is able to further reduce the relative execution time over default by at least 4.10% under the same constraint of observation times.
Loading