Split Learning of Multi-Modal Medical Image Classification

Bishwamittra Ghosh, Yuan Wang, Huazhu Fu, Qingsong Wei, Yong Liu, Rick Siow Mong Goh

Published: 2024, Last Modified: 02 Mar 2026CAI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the past decades, machine learning (ML) has made significant progress in medical image classification. The success can be attributed to two factors: (i) unique patient data collected and processed by clinics/hospitals and (ii) corresponding ML models solving the underlying classification task. In practice, patient data may contain sensitive information unique to patients’ demography; and ML models often require higher computational resources beyond the affordability of an individual hospital.Considering practical concerns, we explore a collaborative ML approach in which the data provider, referred to as the client, aims to leverage the computational resources of a server in jointly training a unified ML model without the need to share any raw data. Specifically, we focus on the skin lesion classification problem using a real-world dataset containing multi-modal image inputs and multi-label ground truth.To enable collaborative yet privacy-preserving skin lesion classification, we develop a learning framework called SplitFusionNet based on u-shape split learning. The key idea of SplitFusionNet is to split the ML model into a (client, server) partition of deep neural network layers: the client layers process multimodal input data and multi-labels, while server layers perform computationally extensive mid-layer computations. Additionally, we apply lossless compression and decompression to improve the communication cost between the client and the server. Experimentally, SplitFusionNet requires less training pipeline time than non-split centralized training while achieving equal predictive performance.

External IDs:dblp:conf/ieeecai/GhoshWFWLG24