Multi-Modal Contrastive Learning for Medical Image Classification with Limited Training Data

Shengzhe Jiao, Yihong Zhang, Yuanyuan Wang, Shingo Mabu, Haoyang Xia, Takahiro Hara

Published: 2024, Last Modified: 21 Jan 2026ICMLA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the rapid development of deep learning algorithms, medical information understanding models have become an effective strategy for diagnosing clinical conditions. One of the most important factors in diagnosis is medical imaging (e.g., X-rays, CT scans). Current research on medical image analysis mainly relies on pretrained models with specific layers, which can improve the performance of medical image classification. However, existing methods rely heavily on large amounts of labeled data generated by medical experts. When labeled data is scarce, the performance of these models deteriorates, even with pretrained models. In this paper, we propose a novel Multi-modal Contrastive Learning (MMCLN) framework that can in-corporate less sensitive auxiliary information into the contrastive framework to improve the performance of the classification based on a limited amount of labeled images. We construct different pretrained with customized layers to generate embeddings of both image and auxiliary data, and then calculate the loss based on the contrastive learning method across various domains, which can analyze the implicit relationships between image and auxiliary data. We conducted experiments on a lung cancer CT dataset, and our proposed model outperformed state-of-the-art methods, demonstrating its effectiveness in scenarios with limited labeled data.

External IDs:dblp:conf/icmla/JiaoZWMXH24