Representational aspects of depth and conditioning in normalizing flows

Frederic Koehler; Viraj Mehta; Andrej Risteski

Representational aspects of depth and conditioning in normalizing flows

Frederic Koehler, Viraj Mehta, Andrej Risteski

Published: 15 Jun 2021, Last Modified: 05 May 2023INNF+ 2021 spotlighttalkReaders: Everyone

Keywords: normalizing flows, representation, function approximation, depth, conditioning

TL;DR: We show that normalizing flows and in particular affine couplings must trade off representational power with an increased depth or poorer conditioning.

Abstract: Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point. Training normalizing flows can be difficult because models which produce good samples typically need to be extremely deep and can often be poorly conditioned: since they are parametrized as invertible maps from $\mathbb{R}^d \to \mathbb{R}^d$, and typical training data like images intuitively is lower-dimensional, the learned maps often have Jacobians that are close to being singular. In our paper, we tackle representational aspects around depth and conditioning of normalizing flows: both for general invertible architectures, and for a particular common architecture, affine couplings. We prove that $\Theta(1)$ affine coupling layers suffice to exactly represent a permutation or $1 \times 1$ convolution, as used in GLOW, showing that representationally the choice of partition is not a bottleneck for depth. We also show that shallow affine coupling networks are universal approximators in Wasserstein distance if ill-conditioning is allowed, and experimentally investigate related phenomena involving padding. Finally, we show a depth lower bound for general flow architectures with few neurons per layer and bounded Lipschitz constant.

Questions/feedback Request For Reviewers: Paper to appear in ICML 2021.

3 Replies

Loading