DIANA with Compression for Distributed Variational Inequalities: Eliminating the Need to Transmit Full Gradients
Keywords: Variational inequalities, Compression operators, Convex optimization, Distributed learning
Abstract: Variational inequalities (VIs) are attracting increasing interest among machine learning (ML) researchers due to their applicability in numerous areas, such as empirical risk minimization (ERM) problems, adversarial learning, generative adversarial networks (GANs), and robust optimization. The growing volume of training data necessitates the use of advanced architectures beyond single-node computations. Distributed optimization has emerged as the most natural and efficient paradigm, enabling multiple devices to perform training simultaneously. However, this setup introduces a significant challenge: devices must exchange information with each other, which can substantially reduce the speed of learning. A standard approach to mitigating this issue involves the use of heuristics that allow only partial information transmission. State-of-the-art methods with compression for distributed VIs rely on variance reduction techniques, which makes them inapplicable to practical tasks due to full gradient computation and transmission. In this paper, we obviate the need to consider full gradient computations and introduce a novel algorithm for solving distributed variational inequalities. It combines the classical DIANA algorithm with the Extragradient technique. Additionally, we incorporate an error compensation mechanism, enabling our algorithm to handle the class of contractive compression operators, which are more practical for real-world applications. We provide a comprehensive theoretical analysis with near-optimal convergence guarantees and additionally outperform competitors in CNN and GAN training experiments.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 25336
Loading