A Modeling Language for MapReduce Programing in a Storage System Perspective

Published: 2018, Last Modified: 13 May 2025J. Signal Process. Syst. 2018EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: MapReduce is a powerful distributed data analysis programming model. It runs on big data storage systems and processes data in a parallel way. An appropriate way to ensure the correctness of MapReduce programs is formal method analysis, which requires firstly a formal model of MapReduce. In this paper we propose a modeling language to establish the formal model of the MapReduce framework. Unlike other approaches, our language describes the processing of data in the MapReduce programs from a perspective of underlying files and blocks, so that the details of data processing can be clearly demonstrated. The language is based on our previous work, a language describing the management of massive data storage systems, with extensions from two aspects: block content data refinement and concurrency support. Based on our language, the features of the MapReduce programming model can be discussed.
Loading