Keywords: Preditive churn, Stability, Distillation, Ensembles
Abstract: The impact of an ML model is largely a function of how much trust users have in its predictions. As more data is gathered over time, the model can be updated to take advantage of a larger sample size and have improved performance. Even when model updates improve aggregate metrics such as accuracy, this can lead to errors on samples the previous model got right causing apparent regressions in performance known as predictive churn. Such prediction flips erode user trust thereby reducing the effectiveness of the human-AI team as a whole.
Current approaches for reducing predictive churn fall mainly into two categories: ensembles and distillation. While ensembles are the most effective, they comes at the cost of having to train and use multiple models for inference. Distillation is much more efficient both in terms of training and inference, but is far less effective at reducing churn. We propose a missing middle-ground solution called StackMem based on accumulating models over time which achieves comparable performance to ensembles without any training time increases or changes to training procedures. Additionally, StackMem can be applied to models which are already deployed, unlike ensembles. We demonstrate the effectiveness of StackMem on several computer vision benchmark datasets comparing against STOTA churn reduction methods.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
13 Replies
Loading