TL;DR: Conventional distributed learning is only performed inside cluster because of latency requirements. We scale the distributed training across the world under high latency network.
Abstract: Traditional synchronous distributed training is performed inside a cluster, since it requires high bandwidth and low latency network (e.g. 25Gb Ethernet or Infini-band). However, in many application scenarios, training data are often distributed across many geographic locations, where physical distance is long and latency is high. Traditional synchronous distributed training cannot scale well under such limited network conditions. In this work, we aim to scale distributed learning un-der high-latency network. To achieve this, we propose delayed and temporally sparse (DTS) update that enables synchronous training to tolerate extreme network conditions without compromising accuracy. We benchmark our algorithms on servers deployed across three continents in the world: London (Europe), Tokyo(Asia), Oregon (North America) and Ohio (North America). Under such challenging settings, DTS achieves90×speedup over traditional methods without loss of accuracy on ImageNet.
Keywords: Distributed Training, Bandwidth
Original Pdf: pdf