Abstract: The use of distributed optimization in machine learning can be motivated either by the resulting preservation of privacy or the increase in computational efficiency. On the one hand, training data might be stored across multiple devices. Training a global model within a network where each node only has access to its own confidential data requires the use of distributed algorithms. Even if the data is not confidential, sharing it might be prohibitive due to bandwidth limitations. On the other hand, the ever increasing amount of available data leads to large-scale machine learning problems. By splitting the training process across multiple nodes its efficiency can be significantly increased. This paper demonstrates the application of dual decomposition to the distributed training of $ k $-means clustering problems. After an overview of distributed and federated machine learning, the mixed-integer quadratically constrained programming-based formulation of the $ k $-means clustering training problem is presented. The training can be performed in a distributed manner by splitting the data across different nodes and linking these nodes through consensus-constraints. Finally, the performance of the subgradient method, the bundle trust method and the quasi-Newton dual ascent algorithm are evaluated on a set of benchmark problems.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Virginia_Smith1
Submission Number: 1131
Loading