Handling Imbalanced Medical Dataset with Continuous Class Features using Improved Contrastive Learning

Jungwoo Bae, Jitae Shin

Published: 2025, Last Modified: 02 Mar 2026ICAIIC 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Medical datasets often face data imbalance due to challenges in collecting abnormal data, hindering reliable classification model development. Additionally, while severity-based categorical classes derived from continuous features aid patient understanding, unclear class definitions can lead models to overlook their continuous nature. Typically, these datasets are trained using contrastive learning methods, which focus solely on class-level distinctions without considering the underlying continuous information. In this paper, we propose an Improved Contrastive Learning method (ICL) to effectively learn from medical datasets that represent continuous nature as classes. Our approach incorporates curriculum learning in a two-phase learning framework. In Phase 1, original contrastive learning is applied. In Phase 2, we improve the learning process by sampling proxy lists based on class distribution parameters to address data imbalance and by updating class distance ratios to capture the continuous features between each classes. Our method outperforms existing approaches on the APTOS dataset. Furthermore, low-dimensional manifold visualizations of the learned representations reveal that disease features are progressively distributed according to class severity.
Loading