Backdrop: Stochastic Backpropagation

Siavash Golkar; Kyle Cranmer

Backdrop: Stochastic Backpropagation

Siavash Golkar, Kyle Cranmer

27 Sept 2018 (modified: 22 Jun 2025)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: We introduce backdrop, a flexible and simple-to-implement method, intuitively described as dropout acting only along the backpropagation pipeline. Backdrop is implemented via one or more masking layers which are inserted at specific points along the network. Each backdrop masking layer acts as the identity in the forward pass, but randomly masks parts of the backward gradient propagation. Intuitively, inserting a backdrop layer after any convolutional layer leads to stochastic gradients corresponding to features of that scale. Therefore, backdrop is well suited for problems in which the data have a multi-scale, hierarchical structure. Backdrop can also be applied to problems with non-decomposable loss functions where standard SGD methods are not well suited. We perform a number of experiments and demonstrate that backdrop leads to significant improvements in generalization.

Keywords: stochastic optimization, multi-scale data analysis, non-decomposable loss, generalization, one-shot learning

TL;DR: We introduce backdrop, intuitively described as dropout acting on the backpropagation pipeline and find significant improvements in generalization for problems with non-decomposable losses and problems with multi-scale, hierarchical data structure.

Code: [![github](/images/github_icon.svg) dexgen/backdrop](https://github.com/dexgen/backdrop)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/backdrop-stochastic-backpropagation/code)

4 Replies

Loading