Abstract: We present the Universal Latent Homeomorphic Manifold (ULHM), a framework that unifies semantic representations (e.g., human descriptions, diagnostic labels) and observation-driven machine representations (e.g., pixel intensities, sensor readings) into a single latent structure. Despite originating from fundamentally different pathways, both modalities capture the same underlying reality. We establish homeomorphism, a continuous bijection preserving topological structure, as the mathematical criterion for determining when latent manifolds induced by different semantic-observation pairs can be rigorously unified. When this homeomorphic criterion is satisfied, it enables three critical applications: (1) semantic-guided sparse recovery from incomplete observations, (2) cross-domain transfer learning with empirically assessed structural compatibility, and (3) transductive zero-shot compositional learning via valid transfer from semantic to observation space. Our framework learns continuous manifold-to-manifold transformations through conditional variational inference, with training objectives explicitly designed to enforce bi-Lipschitz homeomorphic properties. We develop practical verification algorithms, including trust, continuity, and Wasserstein distance metrics, that empirically indicate whether the learned representations exhibit properties consistent with homeomorphic structure from finite samples. Experiments demonstrate substantial improvements over state-of-the-art (SOTA) baselines: (1) sparse recovery from 8% of pixels with much lower MSE than SOTA on CelebA under noise, (2) cross-domain transfer achieving 86.73% MNIST$\rightarrow$Fashion-MNIST accuracy without retraining, and (3) transductive zero-shot classification achieving 78.76% on CIFAR-10, exceeding prior work by 16.66%. Critically, the homeomorphism criterion determines when different semantic-observation pairs share compatible latent structure, enabling principled unification into shared representations within the tested domains and suggesting a structured basis for decomposing broad models into domain-specific components.
Beyond PDF: zip
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Since the last submission, we substantially revised the paper in response to the reviewer and AE feedback. The main changes are as follows.
1. We clarified the role of the theoretical results and the empirical verification procedure. In the revised manuscript, we now state explicitly that Algorithm 1 should be viewed as a finite-sample empirical diagnostic, not as a proof that a learned map is homeomorphic. We revised the wording throughout the paper to better separate what is established by the theory under idealized assumptions from what is supported empirically by quantities such as $\beta_0$, Trustworthiness, Continuity, $W_2$, and Alignment.
2. We added a more complete ablation and sensitivity analysis. The supplementary material now includes sweeps over $\lambda_c$, $\lambda_\ell$, and $\lambda_p$, and for the explicit $\lambda_c$ verification sweeps we report both downstream performance and pass/fail outcomes under the empirical diagnostic. These results make the role of the compatibility term clearer: when $\lambda_c = 0$, the learned structure fails in a way that is visible both in transfer behavior and in the diagnostic metrics, whereas varying $\lambda_\ell$ and $\lambda_p$ mainly changes performance more gradually.
3. We added explicit negative-control experiments in the main paper. In particular, we now include incompatible triples such as MNIST + SineWave + ScrambledMNIST and MNIST + RandomNoise + R-MNIST. These experiments were added to directly address the question of whether the verification protocol can detect failure cases. The revised results show that the protocol rejects these incompatible settings and that a small $W_2$ value by itself is not enough when neighborhood-preservation metrics fail.
4. We expanded the empirical scope beyond the original benchmark setting. The supplementary material now includes two additional application domains: fastMRI brain reconstruction and power-grid state estimation. These experiments were added to test whether the framework remains useful outside the original image-classification setting, and the claims in the main paper were revised accordingly to be broader than the original version but still tied to the demonstrated evidence.
5. We clarified the zero-shot setup and the fairness of the comparison. The revised paper now states clearly that our zero-shot setting is transductive rather than fully inductive: unlabeled unseen-class images are available to the representation learner during training, but their labels are withheld. We also revised the text around the zero-shot baselines so that the comparison is described more precisely and does not overstate comparability to fully inductive zero-shot benchmarks.
6. We added a more explicit discussion of baseline comparability. In the main paper we inserted a short note in the evaluation section, and in the supplementary material we added a dedicated baseline-comparability subsection. That subsection now summarizes, for each application, the shared supervision assumptions, training-data access, and transductive versus inductive status while still naming the baselines explicitly. This was added to make the evaluation protocol easier to audit.
7. We revised several broader claims to better match the evidence in the paper. This includes language around verification, transfer, universality, and decomposition into domain-specific components. We also added a broader-impact discussion that addresses possible failure modes in safety-critical settings and the risk of misuse when geometric compatibility is inferred too strongly from finite-sample evidence.
Beyond these substantive changes, we made many smaller revisions for clarity, consistency, and alignment across the main paper, supplement, rebuttal, and reviewer responses. In the revised main manuscript, changes are highlighted in blue.
Video: https://youtu.be/0hNeEN6eDPg
Code: https://github.com/Tong-Andrew-WU/ULHM
Supplementary Material: pdf
Assigned Action Editor: ~Gabriel_Loaiza-Ganem1
Submission Number: 7343
Loading