Natural Gradient Primal-Dual Method for Decentralized Learning

Kenta Niwa; Hiro Ishii; Hiroshi Sawada; Akinori Fujino; Noboru Harada; Rio Yokota

Natural Gradient Primal-Dual Method for Decentralized Learning

Kenta Niwa, Hiro Ishii, Hiroshi Sawada, Akinori Fujino, Noboru Harada, Rio Yokota

Published: 01 Jan 2024, Last Modified: 10 Feb 2025IEEE Trans. Signal Inf. Process. over Networks 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We propose the Natural Gradient Primal-Dual (NGPD) method for decentralized learning of parameters in Deep Neural Networks (DNNs). Conventional approaches, such as the primal-dual method, constrain the local parameters to be similar between connected nodes. However, since most of them follow a first-order optimization method and the loss functions of DNNs may have ill-conditioned curvatures, many local parameter updates and communication among local nodes are needed. For fast convergence, we integrate the second-order natural gradient method into the primal-dual method (NGPD). Since additional constraint minimizes the amount of output change before and after the parameter updates, robustness towards ill-conditioned curvatures is expected. We theoretically demonstrate the convergence rate for the averaged parameter (the average of the local parameters) under certain assumptions. As a practical implementation of NGPD without a significant increase in computational overheads, we introduce Kronecker Factored Approximate Curvature (K-FAC). Our experimental results confirmed that NGPD achieved the highest test accuracy through image classification tasks using DNNs.

Loading