Good Trees: Pruning Random Forests Without Compromise

Good Trees: Pruning Random Forests Without Compromise

TMLR Paper3962 Authors

14 Jan 2025 (modified: 27 Mar 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Random forests are a powerful ensemble-based machine learning tool that combines multiple decision trees to form a better predictor than that based on the individual trees. Because each random forest prediction utilizes every tree’s prediction, computation scales linearly as the ensemble size increases. Motivated by the idea that an intelligently selected subset of trees can perform comparably to the random forest it is derived from, we propose two novel algorithms to reduce the number of trees in a forest without compromising performance: tree slices and pruned groups. Tree slices select all trees within a specific performance range, while pruned groups use a representative sample of the forest to make a weighed approximation of the forest's performance. While performance can vary between datasets, the results are promising and suggest certain workflows can be greatly improved by these techniques.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Andres_R_Masegosa1

Submission Number: 3962

Loading