Abstract: Decision tree is one of the most commonly-used tools in data mining. Most popular induction algorithms construct decision trees in top-down manner. These algorithms generally select splitting feature only with regard to current nodes’ data, while ignoring history information. This kind of approaches need to search whole feature space during splitting each node and will be quite time-consuming in high-dimensional cases. To tackle this problem, we propose an impurity-based heuristic schema (IBH) to utilize history information to accelerate existing top-down induction algorithms. In details, when child node’s impurity is smaller than parent node’s, IBH takes feature performance in parent node as the pseudo upper bound of that in child node, to cut down unpromising computation. The feature selection of IBH biases toward the ones that perform better in parent nodes. Both mathematical analysis and experimental results demonstrate the coherence between IBH and original induction algorithms. Experiments show that IBH can significantly reduce induction time without accuracy degradation in both decision tree and related ensemble methods.
External IDs:dblp:conf/ideal/LiuLZS16
Loading