Keywords: Riesz transform, equivariance, spatial transformer networks, unsupervised learning, rotation equivariance, scale equivariance, group convolution, steerable filters, geometric transformations, SO(2)-equivariance, computer vision, variational autoencoding, object discovery, limited-data regimes, composite transformations
Abstract: Building models robust to transformations such as rotation, scale, and translation is a challenge in machine learning and computer vision.
Existing approaches often provide only partial and discrete equivariance (group equivariance) or rely on supervision or very abundant data to learn equivariant representations.
To achieve fine-grained equivariance from low data, we combine and improve over both approaches.
We propose a novel, learnable, Riesz-transform-based architecture that achieves built-in group equivariance for translation, rotation, and scale.
We combine it with a Spatial Transform Network (STN) tailored for the sequential estimation of composite transformations, reducing the combinatorial data requirements for learning fine-grained equivariance.
Improved generalization guarantees and extensive experiments demonstrate that our approach brings improvements over state-of-the-art methods in unsupervised representation learning and object discovery, even more so in low-data regimes.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 25094
Loading