LShape Partitioning: Parallel Skyline Query Processing Using $MapReduce$MapReduce

Heri Wijayanto, Wenlu Wang, Wei-Shinn Ku, Arbee L. P. Chen

Published: 2022, Last Modified: 11 Nov 2023IEEE Trans. Knowl. Data Eng. 2022Readers: Everyone

Abstract: A skyline query searches the data points that are not dominated by others in the dataset. It is widely adopted for many applications which require multi-criteria decision making. However, skyline query processing is considerably time-consuming for a high-dimensional large scale dataset. Parallel computing techniques are therefore needed to address this challenge, among which <inline-formula><tex-math notation="LaTeX">$MapReduce$</tex-math></inline-formula> is one of the most popular frameworks to process big data. A great number of efficient <inline-formula><tex-math notation="LaTeX">$MapReduce$</tex-math></inline-formula> skyline algorithms have been proposed in the literature and most of their designs focus on partitioning and pruning the given dataset. However, there are still opportunities for further parallelism. In this study, we propose two parallel skyline processing algorithms using a novel <inline-formula><tex-math notation="LaTeX">$LShape$</tex-math></inline-formula> partitioning strategy and an effective <inline-formula><tex-math notation="LaTeX">$Propagation$</tex-math></inline-formula> <inline-formula><tex-math notation="LaTeX">$Filtering$</tex-math></inline-formula> method. These two algorithms are <inline-formula><tex-math notation="LaTeX">$2Phase$</tex-math></inline-formula> <inline-formula><tex-math notation="LaTeX">$LShape$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$1Phase$</tex-math></inline-formula> <inline-formula><tex-math notation="LaTeX">$LShape$</tex-math></inline-formula> , used for multiple reducers and single reducer, respectively. By extensive experiments, we verify that our algorithms outperformed the state-of-the-art approaches, especially for high-dimensional large scale datasets.

0 Replies