Computing the Split Points for Learning Decision Tree in MapReduce

Published: 2013, Last Modified: 17 Apr 2025DASFAA (2) 2013EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The explosive growth of Data is bringing more and more challenges and opportunities to data mining. In data mining, learning decision tree is a common method, in which determining split points is the key problem. Existing methods of calculating split points in the distributed setting on large data either (1) cause high communication overhead or (2) are not universal for different levels of skewness of data distribution. In this paper, we study the properties of Gini impurity, which is a measure for determining split points, and design new algorithms for calculating split points in MapReduce. Empirical evaluation demonstrates that our method outperforms existing state-of-the-art techniques on communication cost and universality.
Loading