An alternative C++-based HPC system for Hadoop MapReduce

Published: 25 Jul 2022, Last Modified: 04 Mar 2025De Gruyter Open AccessEveryoneCC BY 4.0
Abstract: MapReduce (MR) is a technique used to improve distributed data processing vastly and can massively speed up computation. Hadoop and MR rely on memory-intensive JVM and Java. A MR framework based on High-Performance Computing (HPC) could be used, which is both memory-efficient and faster than standard MR. This article explores a C++-based approach to MR and its feasibility on multiple factors like developer friendliness, deployment interface, efficiency, and scalability. This article also introduces Eager Reduction and Delayed Reduction techniques to speed up MR.
Loading