ToRL: Topology-preserving Representation Learning Of Object Deformations From Images

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Representation Learning, Deformations, Topology
TL;DR: ToRL: Topology-preserving Representation Learning
Abstract: Representation learning of object deformations from images has been a long-standing challenge in various image or video analysis tasks. Existing deep neural networks typically focus on visual features (e.g., intensity and texture), but they often fail to capture the underlying geometric and topological structures of objects. This limitation becomes especially critical in areas, such as medical imaging and 3D modeling, where maintaining the structural integrity of objects is essential for accuracy and generalization across diverse datasets. In this paper, we introduce ToRL, a novel *Topology-preserving Representation Learning* model that, for the first time, offers an explicit mechanism for modeling intricate object topology in the latent feature space. We develop a comprehensive learning framework that captures object deformations via learned transformation groups in the latent space. Each layer of our network's decoder is carefully designed with an integrated smooth composition module, ensuring that topological properties are preserved throughout the learning process. Moreover, in contrast to a few related works that rely on a reference image to predict object deformations during inference, our approach eliminates this impractical requirement. To validate ToRL's effectiveness, we conduct extensive multi-class classification experiments across a wide range of datasets, including synthetic 2D images, real 3D brain magnetic resonance imaging (MRI) scans, real 3D adrenal computed tomography (CT) shapes, and \textcolor{blue}{real 2D facial expression images}. Experimental results demonstrate that ToRL outperforms state-of-the-art methods, setting a new way to enforce topological consistency in representation learning. Our code is available at - https://anonymous.4open.science/r/ToRL-44BF/
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11927
Loading