Estimating the Rate-Distortion Function by Wasserstein Gradient Descent
Keywords: rate-distortion theory, optimal transport, maximum-likelihood estimation
TL;DR: We propose a new algorithm for estimating the R-D function, which requires no neural networks and minimal tuning. We also highlight the connections between R-D, entropic OT, and MLE.
Abstract: In the theory of lossy compression, the rate-distortion function $R(D)$ of a given data source characterizes the fundamental limit of compression performance by any algorithm. We propose a method to estimate $R(D)$ in the continuous setting based on Wasserstein gradient descent. While the classic Blahut--Arimoto algorithm only optimizes probability weights over the support points of its initialization, our method leverages optimal transport theory and learns the support of the optimal reproduction distribution by moving particles. This makes it more suitable for high dimensional continuous problems. Our method complements state-of-the-art neural network-based methods in rate-distortion estimation, achieving comparable or improved results with less tuning and computation effort. In addition, we can derive its convergence and finite-sample properties analytically. Our study also applies to maximum likelihood deconvolution and regularized Kantorovich estimation, as those tasks boil down to mathematically equivalent minimization problems.
Submission Number: 30