Abstract: Ensemble learning is a machine learning approach where multiple learners are trained to solve a particular problem. Random Forest is an ensemble learning algorithm which comprises numerous decision trees and nominates a class through majority voting for classification and averaging approach for regression. The prior research affirms that the learning time of the Random Forest algorithm linearly increases when the number of trees in the forest augments. This large number of decision trees in the Random Forest can cause certain challenges. Firstly, it can enlarge the model complexity, and secondly, it can negatively affect the efficiency of large-scale datasets. Hence, ensemble pruning methods (e.g. Clustering Ensemble Selection (CES)) are devised to select a subset of decision trees out of the forest. The main challenge is that the prior CES models require the number of clusters as input. To solve the problem, we devise an Automatic CES pruning model (Auto-CES) for Random Forest which can automatically find the proper number of clusters. Our proposed model is able to obtain an optimal subset of trees that can provide the same or even better effectiveness compared to the original set. Auto-CES has two components: clustering and selection. First, our algorithm utilizes a new clustering technique to classify homogeneous trees. In selection part, it takes both accuracy and diversity of the trees into consideration to choose the best tree.
Loading