Abstract: Federated learning enables collaborative model training across numerous edge devices without requiring participants to share data; however, memory and communication constraints on these edge
devices may preclude their participation in training. We consider a setting in which a subset of
edge devices are below a critical memory or communication threshold required to conduct model
updates. Under typical federated optimization algorithms, these devices are excluded from training
which renders their data inaccessible and increases system induced bias. We are inspired by MeZO,
a zeroth-order method used for memory-efficient fine-tuning. The increased variance inherent to
zeroth-order gradient approximations has relegated previous zeroth-order optimizers exclusively to
the domain of fine tuning; a limitation we seek to correct. We devise a federated, memory-efficient
zeroth-order optimizer, ZOWarmUp that permits zeroth-order training from a random initialization. ZOWarmUp leverages differing client capabilities and careful variance reduction techniques
to facilitate participation of under-represented, low-resource clients in model training. Like other
federated zeroth-order methods, ZOWarmUp eliminates the need for edge devices to transmit their
full gradients to the server and instead relies on only a small set of random seeds, rendering the
up-link communication cost negligible. We present experiments using various datasets and model
architectures to show that ZOWarmUp is a robust algorithm that can can be applied under a wide
variety of circumstances. For systems with a high proportion of edge devices that would otherwise
be excluded from training, this algorithm provides access to a greater volume and diversity of data,
thus improving training outcomes.
Loading