Decoupling Backpropagation using Constrained Optimization Methods

Akhilesh Gotmare; Valentin Thomas; Johanni Brea; Martin Jaggi

Decoupling Backpropagation using Constrained Optimization Methods

Akhilesh Gotmare, Valentin Thomas, Johanni Brea, Martin Jaggi

Published: 27 Jun 2018, Last Modified: 05 May 2023ICML 2018 ECA SubmissionReaders: Everyone

Keywords: model Parallelism, backpropagation, decoupling, neural networks

TL;DR: We propose BlockProp which lets one train deep neural networks in model parallel fashion, where parts of the model may reside on different devices (GPUs).

Abstract: We propose BlockProp, a neural network training algorithm. Unlike backpropagation, it does not rely on direct top-to-bottom propagation of an error signal. Rather, by interpreting backpropagation as a constrained optimization problem we split the neural network model into sets of layers (blocks) that must satisfy a consistency constraint, i.e. the output of one set of layers must be equal to the input of the next. These decoupled blocks are then updated with the gradient of the optimization constraint violation. The main advantage of this formulation is that we decouple the propagation of the error signal on different subparts (blocks) of the network making it particularly relevant for multi-devices applications.

2 Replies

Loading