Seeing Through the Noise: Structural Causal Discovery of Visual Categories under Distribution Shift

Siqi Li

Published: 29 Mar 2026, Last Modified: 06 May 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: The difficulty of generalized category discovery lies not merely in clustering unlabeled images, but in doing so under a distribution shift: the unlabeled set contains classes absent from the labeled partition, and the model must infer their existence without explicit supervision. Current approaches treat this as a representation-learning problem, training encoders that map images to embeddings and then grouping those embeddings. While intuitive, this pipeline conflates two questions that are causally distinct—what makes an image belong to its category, and what makes it look the way it does in this particular snapshot. We draw a formal distinction between these by casting category discovery as a structural causal problem. In our formulation, object semantics reside in a set of localized visual primitives, while background texture, spatial layout, and photographic framing enter as independent sources of variation whose associations with category labels are purely incidental. To recover these primitives, we devise a self-supervised decomposition that learns to assign image regions to a discrete set of semantically stable indices, guided by constraints inspired by the perceptual organization principles observed in human vision. The decomposition is then locked in place through a reconstruction stage that refines the representation without semantic drift. Access to per-pixel foreground estimates enables a straightforward counterfactual generation procedure: we transplant objects into new backgrounds and apply spatial transformations, creating samples that break the spurious correlations present in the original data. Training with these counterfactuals as a regularization signal forces the discovery algorithm to attend to the invariant causal core of each instance. Empirical results confirm that this causal augmentation strategy, when integrated as a drop-in enhancement into existing pipelines, delivers robust improvements on fine-grained benchmarks where nuisance variation is most severe, and holds its own on coarser settings as well.