The Importance of Generalizability in Machine Learning for Systems

Varun Gohil, Sundar Dev, Gaurang Upasani, David Lo, Parthasarathy Ranganathan, Christina Delimitrou

Published: 01 Jan 2025, Last Modified: 20 May 2025HPCA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Using machine learning (ML) to tackle computer systems tasks is gaining popularity. One of the shortcomings of such ML-based approaches is the inability of models to generalize to out-ofdistribution data i.e., data whose distribution is different than the training dataset. We showcase that this issue exists in cloud environments by analyzing various ML models used to improve resource balance in Google’s fleet. We discuss the trade-offs associated with different techniques used to detect out-of-distribution data. Finally, we propose and demonstrate the efficacy of using Bayesian models to detect the model’s confidence in its output when used to improve cloud server resource balance.