Advancing Multimodal Unified Discrete Representations

Hai Huang; Yan Xia; Shengpeng Ji; Shulei Wang; Hanting Wang; Minghui Fang; Jieming Zhu; Zhenhua Dong; Sashuai Zhou; Zehan Wang; Zhou Zhao

Advancing Multimodal Unified Discrete Representations

Hai Huang, Yan Xia, Shengpeng Ji, Shulei Wang, Hanting Wang, Minghui Fang, Jieming Zhu, Zhenhua Dong, Sashuai Zhou, Zehan Wang, Zhou Zhao

25 Sept 2024 (modified: 14 Jun 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Training-free Optimization, MultiModal Learning, Representation Learning

Abstract: To enhance the interpretability of multimodal unified representations, many studies have focused on discrete unified representations. These efforts typically start with contrastive learning and gradually extend to the disentanglement of modal information, achieving solid multimodal discrete unified representations. However, existing research often overlooks two critical issues: 1) Different modalities have unique characteristics, and a uniform alignment approach does not fully exploit these traits; 2) The use of Euclidean distance for quantization in discrete representations often overlooks the important distinctions among different dimensions of features, resulting in redundant representations after quantization. To address these issues, we propose Fine and Coarse Cross-modal Information Disentangling (FCCID) and Training-Free Optimization of Codebook (TOC). These methods respectively perform fine and coarse disentanglement of information based on the specific characteristics of different modalities and refine the unified discrete representations obtained from pretraining. Compared to the previous state-of-the-art, our model demonstrates significant performance improvements. The code is provided in the supplementary materials.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4819

Loading