Keywords: cost-sensitive classification, selective prediction, decision-focused learning, oblique decision forests, distribution shift, covariate shift, label noise, missing data, MCAR missingness
Abstract: Decision forests are widely used for tabular data due to their efficiency and strong performance, but they typically optimize accuracy under i.i.d. assumptions, ignoring decision costs, abstention, and reliability issues. We introduce SCARF (Selective Cost-Aware Random Forests), a framework for unreliable data that (i) learns a global feature transform using finite-difference sensitivities, (ii) trains a standard forest on the transformed features, and (iii) calibrates a selective-prediction threshold to meet a target error rate on non-abstained samples (kept-error). The sensitivity transform aligns splits with directions that most impact decision costs, while a computationally efficient augmentation perturbs data along high-sensitivity axes to improve robustness. On public credit-risk datasets subjected to covariate shift, Missing Completely At Random (MCAR) patterns, and label noise, SCARF reduces policy cost by 11-15\%, while maintaining 83-88\% coverage at target 10\% kept-error, outperforming strong boosted and oblique baselines. Ablations indicate complementary contributions from the finite-difference-based transform, selective calibration, and sensitivity-guided augmentation. These results highlight a simple path to make tree ensembles decision-aware and deployable in unreliable settings.
Submission Number: 220
Loading