Compressing tree ensembles through Level-wise Optimization and Pruning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: A method is proposed that can reduce the size of decision tree ensembles by orders of magnitude with negligible cost in accuracy.
Abstract: Tree ensembles (e.g., gradient boosting decision trees) are often used in practice because they offer excellent predictive performance while still being easy and efficient to learn. In some contexts, it is important to additionally optimize their size: this is specifically the case when models need to have verifiable properties (verification of fairness, robustness, etc. is often exponential in the ensemble's size), or when models run on battery-powered devices (smaller ensembles consume less energy, increasing battery autonomy). For this reason, compression of tree ensembles is worth studying. This paper presents LOP, a method for compressing a given tree ensemble by pruning or entirely removing trees in it, while updating leaf predictions in such a way that predictive accuracy is mostly unaffected. Empirically, LOP achieves compression factors that are often 10 to 100 times better than that of competing methods.
Lay Summary: Larger machine learning models tend to increase predictive performance. However, in some situations, like running on a phone with limited battery or ensuring fair predictions, it’s important to make these models smaller. Smaller models are easier to provide fairness guarantees for and use less energy. In this paper, we introduce a novel method called LOP to make a specific type of machine learning model much smaller: tree ensembles. We can do this by removing some parts of the model that minimally affect the model’s predictions. In tests, LOP was able to shrink models much more effectively, by 10 to 100 times, compared to other approaches.
Link To Code: https://github.com/ML-KULeuven/lop_compress
Primary Area: General Machine Learning->Supervised Learning
Keywords: ensembles, decision forests, model efficiency, energy efficiency, verification, compression
Submission Number: 16090
Loading