Non-MapReduce computing for intelligent big data analysis

Xudong Sun, Lingxiang Zhao, Jiaqi Chen, Yongda Cai, Dingming Wu, Joshua Zhexue Huang

Published: 01 Jan 2024, Last Modified: 06 Feb 2025Eng. Appl. Artif. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: MapReduce is a popular paradigm in distributed computing, but it is not efficient when executing iterative algorithms over a distributed big dataset due to its heavy data communication overhead. Non-MapReduce computing is an alternative for improving computing efficiency and data scalability when using iterative algorithms to process big distributed datasets on clusters. In this paper, we investigate Non-MapReduce approach in distributed computing and use Spark implementations of machine learning algorithms to discuss the problems of MapReduce in executing iterative algorithms over a big distributed dataset and the advantages of Non-MapReduce for the same tasks. We present a method to build a new machine learning library made of sequential algorithms for distributed computing. We use experiment results to show comparisons of computing efficiency and data scalability of MapReduce and Non-MapReduce in executing six machine learning algorithms over big datasets.