- Keywords: model Parallelism, backpropagation, decoupling, neural networks
- TL;DR: We propose BlockProp which lets one train deep neural networks in model parallel fashion, where parts of the model may reside on different devices (GPUs).
- Abstract: We propose BlockProp, a neural network training algorithm. Unlike backpropagation, it does not rely on direct top-to-bottom propagation of an error signal. Rather, by interpreting backpropagation as a constrained optimization problem we split the neural network model into sets of layers (blocks) that must satisfy a consistency constraint, i.e. the output of one set of layers must be equal to the input of the next. These decoupled blocks are then updated with the gradient of the optimization constraint violation. The main advantage of this formulation is that we decouple the propagation of the error signal on different subparts (blocks) of the network making it particularly relevant for multi-devices applications.