Abstract: We explore the landscape of distributed machine learning, focusing on advancements, challenges, and potential future directions in this rapidly evolving field. We delve into the motivation for distributed machine learning, its essential techniques, real-world applications, and open research questions. The theoretical discussion will give an overview of proving the convergence of popular Stochastic Gradient Descent (SGD) Algorithms to train contemporary machine learning models, including the deep learning models with the assumption of non-convexity, in a distributed setting. We will specify the convergence of data parallel SGD for various distributed systems properties, such as asynchronous and compressed communication. We will also discuss distributed machine learning techniques such as model parallelism and tensor parallelism to train large language models (LLMs).
Loading