CLIP as Multi-Task Multi-Kernel Learning

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: pdf
Primary Area: metric learning, kernel learning, and sparse coding
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Contrastive Language-Image Pretraining, Reproducing Kernel Hilbert Space, Multi-Task Multi-Kernel Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Contrastive Language-Image Pretraining (CLIP) is a foundational model that learns a latent embedding space through an inner product-based objective. In this paper, we provide a theoretical interpretation of CLIP utilizing Reproducing Kernel Hilbert Space (RKHS) framework. Specifically, we reformulate the problem of estimating the infinite-dimensional mapping with a neural network as selecting an unknown RKHS using multiple kernel learning. Such connection motivates us to propose to estimate the CLIP embedding via the multi-task multi-kernel (MTMK) method: we reformulate the different labels in the CLIP training data as the multiple training tasks, and reformulate learning the unknown CLIP embedding as choosing an optimal kernel from a family of Reproducing Kernel Hilbert Spaces, which is computationally more efficient. Utilizing the MTMK interpretation of CLIP, we also show an optimal statistical rate of the MTMK classifier under the scenario that both the number of covariates and the number of candidate kernels can increase with the sample size. Besides the synthetic simulations, we apply the proposed method to align the medical imaging data with the clinical codes in electronic health records and illustrate that our approach can learn the proper kernel space aligning the imaging embedding with the text embeddings with high accuracy.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6786
Loading