Clustering-Based Numerosity Reduction for Cloud Workload Forecasting

Published: 01 Jan 2023, Last Modified: 21 Mar 2025ALGOCLOUD 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Finding smaller versions of large datasets that preserve the same characteristics as the original ones is becoming a central problem in Machine Learning, especially when computational resources are limited, and there is a need to reduce energy consumption. In this paper, we apply clustering techniques for wisely selecting a subset of datasets for training models for time series prediction of future workload in cloud computing. We train Bayesian Neural Networks (BNNs) and state-of-the-art probabilistic models to predict machine-level future resource demand distribution and evaluate them on unseen data from virtual machines in the Google Cloud data centre. Experiments show that selecting the training data via clustering approaches such as Self Organising Maps allows the model to achieve the same accuracy in less than half the time, requiring less than half the datasets rather than selecting more data at random. Moreover, BNNs can capture uncertainty aspects that can better inform scheduling decisions, which state-of-the-art time series forecasting methods cannot do. All the considered models achieve prediction time performance suitable for real-world scenarios.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview