Highly Efficient Parallel Framework: A Divide-and-Conquer Approach

Published: 01 Jan 2015, Last Modified: 04 Mar 2025DEXA (2) 2015EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Coupling a database and a parallel-programming framework reduces the I/O overhead between them. However, there will be serious issues such as memory bandwidth limitations, load imbalances, and race conditions. Existing frameworks such as MapReduce do not resolve these problems because they adopt flat parallelization, i.e., partitioning a task without regard to its structure. In this paper, we propose a recursive divide-and-conquer-based method for spatial databases which supports high-throughput machine learning. Our approach uses a tree-based task structure, which improves the reference locality, and load balancing is realized by setting the grain size of tasks dynamically. Race conditions are also avoided. We applied our method to the task of learning a hierarchical Poisson mixture model. The results show that our approach achieves strong scalability and robustness against load-imbalanced datasets.
Loading