The Importance of Generalizability in Machine Learning for Systems

Published: 01 Jan 2024, Last Modified: 20 May 2025IEEE Comput. Archit. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Using machine learning (ML) to tackle computer systems tasks is gaining popularity. One of the shortcomings of such ML-based approaches is the inability of models to generalize to out-of-distribution data i.e., data whose distribution is different than the training dataset. We showcase that this issue exists in cloud environments by analyzing various ML models used to improve resource balance in Google's fleet. We discuss the trade-offs associated with different techniques used to detect out-of-distribution data. Finally, we propose and demonstrate the efficacy of using Bayesian models to detect the model's confidence in its output when used to improve cloud server resource balance.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview