Keywords: structured filtering, dynamic inference
TL;DR: Dynamic receptive fields with spatial Gaussian structure are accurate and efficient.
Abstract: The visual world is vast and varied, but its variations divide into structured and unstructured factors. Structured factors, such as scale and orientation, admit clear theories and efficient representation design. Unstructured factors, such as what it is that makes a cat look like a cat, are too complicated to model analytically, and so require free-form representation learning. We compose structured Gaussian filters and free-form filters, optimized end-to-end, to factorize the representation for efficient yet general learning. Our experiments on dynamic structure, in which the structured filters vary with the input, equal the accuracy of dynamic inference with more degrees of freedom while improving efficiency. (Please see https://arxiv.org/abs/1904.11487 for the full edition.)