Spark性能优化技术研究综述 (Survey on Performance Optimization Technologies for Spark)

Published: 01 Jan 2018, Last Modified: 13 Nov 2024计算机科学 2018EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years,with the advent of the era of big data,big data processing platform is developing very fast.A large number of big data processing platforms,including Hadoop,Spark,Strom and etc.,have appeared,among which Apache Spark is the most prominent one.With the wide applications of Spark at home and abroad,there are many performance problems to be solved.As the underlying implementation mechanism of Spark is very complex,it is difficult for ordinary users to find performance bottlenecks,let alone further optimization.In light of the above problems,the performance optimization technologies for Sparkwere summarized and analyzed from five aspects,including development principles optimization,memory optimization,configuration parameter optimization,scheduling optimization and shuffle process optimization.Finally,the key problems of Spark optimization technologies were summarized and future research issues were proposed.
Loading