Keywords: Self Supervised Learning, Representation Learning, Rotational Equivariance, Disentanglement
Abstract: Conventional self-supervised learning methods extract robust features by enforcing invariance to data augmentations. While effective for obtaining clustered representations, this objective provides limited control over how data variations structure the feature space, hindering disentanglement. Recent methods improve feature space structure by imposing equivariant predictability on feature transformations induced by data augmentations. However, existing approaches suffer from two significant limitations: (i) the incorporation of invariance in their final objective interferes with the learning of neat equivariance; (ii) the imposition of uniform equivariance across all samples forces semantic clusters into a parallel arrangement, leading to reduced inter-cluster distances (for features on the hypersphere). To overcome these issues, we propose in this paper Cluster-Dependent Rotational Equivariance for Disentanglement (CD-RED), a framework that enables learning neat equivariance and uniformly distributed clusters, while further supporting perfect disentanglement. Notably, CD-RED explicitly encodes variations as rotations via a direct product of $SO(2)$ groups within orthogonal hyperspherical subspaces, providing a principled mechanism for precise equivariance. We theoretically and experimentally establish that CD-RED achieves perfectly disentangled representations, suggesting a promising new direction for self-supervised disentanglement.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 9518
Loading