AdaCoOpt: Leverage the Interplay of Batch Size and Aggregation Frequency for Federated Learning

Published: 01 Jan 2023, Last Modified: 06 Feb 2025IWQoS 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Federated Learning (FL) is a distributed learning paradigm that can coordinate heterogeneous edge devices to perform model training without sharing private raw data. Many prior works have analyzed the FL convergence with respect to important hyperparameters, including batch size and aggregation frequency. However, adjusting the batch size and the number of local updates can affect the model performance, training time, and the cost of consuming computation and communication resources, in different and perhaps complex forms. Their joint effects have been overlooked and should be exploited to achieve accurate models with controllable operational expenditure. This paper proposes novel analytical models and optimization algorithms that leverage the interplay of batch size and aggregation frequency to navigate the trade-offs among convergence, cost, and completion time for FL. We first obtain a new convergence bound of the training error under heterogeneous training datasets across devices. Based on this bound, we derive closed-form solutions of a co-optimized batch size and aggregation frequency, a single configuration for all the devices. We then design an efficient exact algorithm for assigning different batch configurations across devices that can further improve the model accuracy to address the heterogeneity of both data and system characteristics. Further, we propose an adaptive control algorithm to dynamically adjust the solutions with estimated network states. Extensive experiments demonstrate the superiority of our offline optimal solutions and online adaptive algorithm.
Loading