Feature Ranking from Random Forest Through Complex Network's Centrality Measures - A Robust Ranking Method Without Using Out-of-Bag Examples
Abstract: The volume of available data in recent years has rapidly increased. In consequence, datasets commonly end up with many irrelevant features. That increase may disturb human understanding and even lead to poor machine learning models. This research proposes a novel feature ranking method that employs trees from a Random Forest to transform a dataset into a complex network to which centrality measures are applied to rank the features. That process takes place by representing each tree as a graph where all the tree features are vertices on this graph, and the links within the nodes (father $$\rightarrow $$ child) of the tree are represented by a weighted edge between the two respective vertices. The union of all graphs from individual trees leads to the complex network. Then, three centrality measures are applied to rank the features in the complex network. Experiments were performed in eighty-five supervised classification datasets, with a variation in the feature noise level, to evaluate our novel method. Results show that centrality measures in non-oriented complex networks are comparable and may be correlated to the Random Forest’s variable importance ranking algorithm. Vertex strength and eigenvector outperformed the Random Forest in 40% noise datasets, with a not statistically different result at a 95% confidence level.
0 Replies
Loading