Position: An Empirically Grounded Identifiability Theory Will Accelerate Self Supervised Learning Research

Patrik Reizinger; Randall Balestriero; David Klindt; Wieland Brendel

Position: An Empirically Grounded Identifiability Theory Will Accelerate Self Supervised Learning Research

Patrik Reizinger, Randall Balestriero, David Klindt, Wieland Brendel

Published: 01 May 2025, Last Modified: 24 Jul 2025ICML 2025 Position Paper Track posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: To drive SSL forward, we should develop a principled theoretical understanding of SSL, grounded empirical observations

Abstract: Self-Supervised Learning (SSL) powers many current AI systems. As research interest and investment grow, the SSL design space continues to expand. The Platonic view of SSL, following the Platonic Representation Hypothesis (PRH), suggests that despite different methods and engineering approaches, all representations converge to the same Platonic ideal. However, this phenomenon lacks precise theoretical explanation. By synthesizing evidence from Identifiability Theory (IT), we show that the PRH can emerge in SSL. There is a gap between SSL theory and practice: Current IT cannot explain SSL's empirical success, though it has practically relevant insights. Our work formulates a blueprint for SSL research to bridge this gap: we propose expanding IT into what we term Singular Identifiability Theory (SITh), a broader theoretical framework encompassing the entire SSL pipeline. SITh would allow deeper insights into the implicit data assumptions in SSL and advance the field towards learning more interpretable and generalizable representations. We highlight three critical directions for future research: 1) training dynamics and convergence properties of SSL; 2) the impact of finite samples, batch size, and data diversity; and 3) the role of inductive biases in architecture, augmentations, initialization schemes, and optimizers.

Lay Summary: We pinpoint the gap between the empirical and theoretical advances in self-supervised representation learning (SSL): mostly that the focus and the research questions are different, and that there is not enough cross-pollination between the two communities. We use the lens of identifiability theory (IT) to propose a research agenda for SSL, which we believe can build upon, but needs to extend, current IT.

Verify Author Names: My co-authors have confirmed that their names are spelled correctly both on OpenReview and in the camera-ready PDF. (If needed, please update ‘Preferred Name’ in OpenReview to match the PDF.)

No Additional Revisions: I understand that after the May 29 deadline, the camera-ready submission cannot be revised before the conference. I have verified with all authors that they approve of this version.

Pdf Appendices: My camera-ready PDF file contains both the main text (not exceeding the page limits) and all appendices that I wish to include. I understand that any other supplementary material (e.g., separate files previously uploaded to OpenReview) will not be visible in the PMLR proceedings.

Latest Style File: I have compiled the camera ready paper with the latest ICML2025 style files <https://media.icml.cc/Conferences/ICML2025/Styles/icml2025.zip> and the compiled PDF includes an unnumbered Impact Statement section.

Paper Verification Code: OGI1N

Permissions Form: pdf

Primary Area: Research Priorities, Methodology, and Evaluation

Keywords: SSL, identifiability, Platonic Representation Hypothesis, model similarity, representation learning

Submission Number: 334

Loading