Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning

Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning

ICLR 2026 Conference Submission13078 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: melanoma, skin cancer, self-supervision, representation learning, tabular metadata

TL;DR: SLIMP leverages nested contrastive learning to fuse visual and clinical data for skin lesion analysis in all stages of the training process. The approach yields richer representations, improving downstream classification performance.

Abstract: We introduce SLIMP (Skin Lesion Image-Metadata Pre-training) for learning rich representations of skin lesions through a novel nested contrastive learning approach that captures complementary information between images and metadata. Melanoma detection and skin lesion classification based solely on images, pose significant challenges due to large variations in imaging conditions (lighting, color, resolution, distance, etc.) and lack of clinical and phenotypical context. Clinicians typically follow a holistic approach for assessing the risk level of the patient and for deciding which lesions may be malignant and need to be excised by considering the patient's medical history as well as the appearance of other lesions of the patient. Inspired by this, SLIMP combines the appearance and the metadata of individual skin lesions with patient-level metadata relating to their medical record and other clinically relevant information. By fully exploiting all available data modalities throughout the learning process, the proposed pre-training strategy improves performance compared to other pre-training strategies on downstream skin lesion classification tasks, highlighting the learned representations quality.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 13078

Loading