Joint Embedding Variational Bayes

Published: 24 Apr 2026, Last Modified: 24 Apr 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We introduce Variational Joint Embedding (VJE), a reconstruction-free latent-variable framework for non-contrastive self-supervised learning in representation space. VJE maximizes a symmetric conditional evidence lower bound (ELBO) on paired encoder embeddings by defining a conditional likelihood directly on target representations, rather than optimizing a pointwise compatibility objective. The likelihood is instantiated as a heavy-tailed Student--\(t\) distribution on a polar representation of the target embedding, where a directional--radial decomposition separates angular agreement from magnitude consistency and mitigates norm-induced pathologies. The directional factor operates on the unit sphere, yielding a valid variational bound for the associated spherical subdensity model. An amortized inference network parameterizes a diagonal Gaussian posterior whose feature-wise variances are shared with the directional likelihood, yielding anisotropic uncertainty without auxiliary projection heads. Across ImageNet-1K, CIFAR-10/100, and STL-10, VJE is competitive with standard non-contrastive baselines under linear and \(k\)-NN evaluation, while providing probabilistic semantics directly in representation space for downstream uncertainty-aware applications. We validate these semantics through out-of-distribution detection, where representation-space likelihoods yield strong empirical performance. These results position the framework as a principled variational formulation of non-contrastive learning, in which structured feature-wise uncertainty is represented directly in the learned embedding space.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: R1 - Reformulated the directional construction intrinsically on the sphere, with a normalized reference model and a subnormalized directional term used in training; the ELBO is now stated for the associated subprobability model. All carried-over experiments were rerun from scratch under the revised formulation rather than being retained from earlier versions. Additionally: - Clarified fixed-observation semantics and the use of stop-gradient versus EMA target encoders. - Replaced the main uncertainty experiment with OOD detection on CIFAR-10 across multiple near- and far-OOD datasets, based on the OpenOOD benchmark. - Added posterior-geometry analysis, including visualizations and quantitative margin/radius diagnostics. - Expanded the empirical evaluation across ImageNet-1K, CIFAR-10, CIFAR-100, and STL-10, including stop-gradient and EMA variants. - Added reproduced VI-SimSiam comparisons and broader OOD benchmark context. - Added supplementary diagnostics and ablations, including Monte Carlo sampling, isotropic variance failure, factorized vs. standard likelihood analysis, and compute profiling. - Expanded implementation details, reproducibility notes, and limitations/future directions. R2 - No material changes; updates are purely editorial: - Refined Section 5 for clarity and coherence, reducing redundancies and improving presentation. - Improved the interpretation in the geometry and OOD analyses and clarified baseline comparisons. R3 - Minor editorial refinements; no material changes to claims, methodology, or results: - Fixed a misstated intuition for the exp-map Jacobian term in Section 3 and clarified the small-angle limit under which the cosine-alignment reduction holds in Appendix B. - Corrected an equation reference in Appendix C.2 (the Monte Carlo approximation targets the NLL of Eq. (4), not the ELBO). - Fixed citation formatting in Appendices A.1 and B. - Unified the bolding convention across Tables 4, 5, and 9 to indicate the column best and entries within one standard deviation.
Code: https://github.com/aoji/vje
Assigned Action Editor: ~Ole_Winther1
Submission Number: 7339
Loading