STRATA: Random Forests going Serverless

Published: 01 Jan 2024, Last Modified: 07 Feb 2025Middleware 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Serverless computing has received growing interest in recent years for supporting large-scale machine learning tasks. However, training a machine learning model in a serverless environment is a nontrivial procedure and several challenges still need to be addressed in the data distribution and result aggregation steps as well as the cost of execution due to the inherent complexity of the distributed computation and the coordination required in the learning algorithm. In this work, we focus on Random Forests, a state-of-the-art technique in many Machine Learning applications. We propose STRATA, a cost-effective middleware to train Random Forests atop a serverless environment that successfully addresses these training challenges. As we show in our extensive experimental evaluation STRATA achieves 3X better training times on average compared to a centralized approach and can withstand up to 70% of failures during training.
Loading