Abstract: Gradient boosted tree ensembles (GBTEs) such as XGBoost continue to outperform other machine learning models on tabular data. However, the plethora of adjustable hyperparameters can exacerbate optimisation, especially in regression tasks with no intuitive performance measures such as accuracy and confidence. Automated machine learning frameworks alleviate the hyperparameter search for users, but if the optimisation procedure ends prematurely due to resource constraints, it is questionable whether users receive good models. To tackle this problem, we introduce a cost-efficient method to retrofit previously optimised XGBoost models by retraining them with a new weight distribution over the training instances. We base our approach on topological results, which allows us to infer model-agnostic weights for specific regions of the data distribution where the targets are more susceptible to input perturbations. By linking our theory to the training procedure of XGBoost regressors, we then establish a topologically consistent reweighting scheme, which is independent of the specific model instance. Empirically, we verify that our approach improves prediction performance, outperforms other reweighting methods and is much faster than a hyperparameter search. To enable users to find the optimal weights for their data, we provide guides based on our findings on 20 datasets. Our code is available at: https://github.com/montymaxzuehlke/tcr.
Loading