Rim: A reusable iterative model for big data

Published: 01 Jan 2018, Last Modified: 15 Nov 2024Knowl. Based Syst. 2018EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the big data environment, iterative computing is widely used in many applications such as data mining, machine learning, graph analysis and so on. Many iterative computing models are proposed to support the execution of iterative algorithms on big data efficiently. However, it is inefficient if the entire dataset has to be re-iterated when it is partly changed, for example, some data is included or excluded. This paper presents Rim, a Reusable Iterative computing Model which calculates the new iterative results with the updated dataset and the original iterative results, avoiding re-iteration on entire dataset. We propose the application conditions of Rim, and mathematically prove the accuracy and performance advantages of Rim, and describe Rim's application on three typical iterative algorithms, which are PageRank, K-means and Descendant-query. Finally, we implement Rim in Spark, and evaluate its performance on different test cases and iterative algorithms. In term of PageRank, K-Means and Descendant-query, experiments show our approach is on average 1.34×, 2.51×, 3.17× faster than re-iteration on massive dataset, respectively.
Loading