Disentangling Cross-Distribution Shift via Compositional Latent Structures and Covariance Regularization
Abstract: Achieving robust cross-distribution validity remains elusive, as current supervised pretraining paradigms suffer from poor generalization scalability—the tendency for performance rankings to fluctuate wildly under objective shifts. We identify that this instability largely stems from the inability to separate invariant latent concepts from domain-specific noise, a phenomenon exacerbated by spectral concentration in the embedding space. Conventional models fail to capture the deep, intrinsic structures that govern semantic consistency across domains.
In this work, we propose a unified framework that enhances stability by identifying and organizing conceptual primitives through. Our method posits that robust generalization relies on a dual-layer structure: first, decomposing features into low-rank latent primitives, and second, applying an Inverse Geometric Alignment mechanism to regulate their covariance operators. By minimizing the Hilbert–Schmidt norm of the error covariance, we force the model to learn a compositional field that is structurally invariant to domain shifts. This approach not only uncovers the underlying building blocks of visual categories but also ensures that their geometric relationships remain isotropic. Empirical results demonstrate that integrating latent structure identification with strict geometric regularization yields a mathematically grounded pathway to stable, scalable performance in both standard generalization benchmarks and open-world discovery tasks.
Loading