Keywords: deep learning, computer vision, image reconstruction, pointwise mutual information, pmi
TL;DR: We propose a method to find the set of patches in an image that provide most information to the neural network.
Abstract: We consider the problem of distilling an image into an ordered set of maximally informative patches, given prior data from the same domain. We cast this problem as one of maximizing a pointwise mutual information (PMI) objective between a subset of an image's patches and the perceptual content of the entire image. We take an image synthesis-based approach, reasoning that the patches that are most informative would also be most useful for predicting other pixel values. We capture this idea with an image completion CNN trained to model the PMI between an image's perceptual content and any of its subregions. Because our PMI objective is a submodular, monotonic function, we can greedily construct patch sets using the CNN to obtain a provably close approximation to the intractable optimal solution. We evaluate our approach on datasets of faces, common objects, and line drawings. For all datasets, we find that a surprisingly few number of patches are needed to reconstruct most images, demonstrating a particular type of redundancy of information in images, and new potentials in their sparse representations. We also show that these minimal patch sets may be used effectively for downstream tasks such as image classification.