TL;DR: We analyze a non-smooth distributed optimization problem with constraints and provide an algorithm to solve it which matches the lower bound.
Abstract: Federated learning faces severe communication bottlenecks due to the high dimensionality of model updates. Communication compression with contractive compressors (e.g., Top-$K$) is often preferable in practice but can degrade performance without proper handling. Error feedback (EF) mitigates such issues but has been largely restricted for smooth, unconstrained problems, limiting its real-world applicability where non-smooth objectives and safety constraints are critical. We advance our understanding of EF in the canonical non-smooth convex setting by establishing new lower complexity bounds for first-order algorithms with contractive compression. Next, we propose Safe-EF, a novel algorithm that matches our lower bound (up to a constant) while enforcing safety constraints essential for practical applications. Extending our approach to the stochastic setting, we bridge the gap between theory and practical implementation. Extensive experiments in a reinforcement learning setup, simulating distributed humanoid robot training, validate the effectiveness of Safe-EF in ensuring safety and reducing communication complexity.
Lay Summary: We consider a problem in which devices such as phones, sensors, or robots collaborate to train a shared AI model without transmitting all their local data to a central server, due to resource constraints or privacy concerns. A key challenge is that each device must upload large model updates at every iteration, which can quickly saturate the communication network. Compressing updates is one approach to mitigating this bottleneck. However, naive compression often disrupts the learning process. A technique known as error feedback can compensate for the error introduced by compression, but to date, it has only proven effective for simpler tasks without constraints. Yet, constraints are critical in practice for enforcing properties such as safety and fairness in the learned model.
We introduce a novel distributed learning algorithm, Safe-EF, which incorporates error feedback in a manner that ensures constraint satisfaction and achieves effective optimization. We also analyze the algorithm’s performance in settings where clients use only a small subset of their local data, for example, a finite number of trajectory samples in humanoid robot training, to compute updates. In simulated experiments involving humanoid robot training, Safe-EF not only reduces communication costs by orders of magnitude but also preserves the safety and reliability of the robot’s behavior. This work advances the development of scalable, communication-efficient, and safe distributed AI systems.
Link To Code: https://github.com/yardenas/safe-ef
Primary Area: Optimization->Large Scale, Parallel and Distributed
Keywords: Optimization, Distributed Optimization, Compression, Safe Reinforcement Learning
Submission Number: 2839
Loading