Keywords: ML API performance shifts, ML as a service, ML monitoring, ML performance evaluation
Abstract: ML prediction APIs from providers like Amazon and Google have made it simple to use ML in applications. A challenge for users is that such APIs continuously change over time as the providers update models, and changes can happen silently without users knowing. It is thus important to monitor when and how much the MLAPIs’ performance shifts. To provide detailed change assessment, we model MLAPI shifts as confusion matrix differences, and propose a principled algorithmic framework, MASA, to provably assess these shifts efficiently given a sample budget constraint.MASAemploys an upper-confidence bound based approach to adaptively determine on which data point to query the ML API to estimate shifts. Empirically, we observe significant ML API shifts from 2020 to 2021 among 12 out of 36 applications using commercial APIs from Google, Microsoft, Amazon, and other providers. These real-world shifts include both improvements and reductions in accuracy. Extensive experiments show that MASA can estimate such API shifts more accurately than standard approaches given the same budget
One-sentence Summary: We systematically study real world ML APIs' performance shifts due to API updates/retraining and propose a framework to efficiently estimate those shifts.
Supplementary Material: zip