Compress Blocks or Not: Tradeoffs for Energy Consumption of a Big Data Processing System

Published: 2022, Last Modified: 01 Apr 2026IEEE Trans. Sustain. Comput. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Currently, in addition to the performance, the energy consumption (hereinafter EC) of jobs running in a big data processing system is also of interest to academia and industry because it grows rapidly as an increasing amount of data is processed. Many studies focus on the EC optimization of jobs from the perspective of computation, which is specific to the algorithms in each job. However, the part of EC involved in I/O operations, which is general and universal, is mostly ignored in optimization. In this paper, we concentrate on the EC optimization of jobs from the perspective of I/O operations. To save energy, we argue that data compression could be exploited. On one hand, energy is saved by processing compressed data with less I/O cost. On the other hand, extra EC is incurred from the necessary data compression/decompression process, which may offset the saved energy. Therefore, there are tradeoffs to consider when determining whether to compress data for these jobs. In this paper, such tradeoffs and boundary conditions are studied. We first abstract a paradigm for the runtime environment of big data processing jobs. Then, we establish the power, jobs, compression, and I/O models in detail. Based on these models, we discuss the compression tradeoffs and derive the boundary conditions for EC optimization. Finally, we design and conduct experiments to validate our proposition. The experimental results confirm that the tradeoffs and boundary conditions exist for typical jobs in MapReduce and Spark. As explained, first, the EC of a job is reduced using data compression. Second, whether or not such optimization occurs is related to the specification of both the compression algorithm and the job and is determined by corresponding boundary conditions. Third, for a compression algorithm, the larger its compression/decompression speed and the better its compression ratio, the more likely it is to achieve EC optimization.
Loading