Abstract: Breast cancer treatments often affect patients’ body image, making aesthetic outcome predictions vital. This study introduces a Deep Learning (DL) multimodal retrieval pipeline using a dataset of 2,193 instances combining clinical attributes and RGB images of patients’ upper torsos. We evaluate four retrieval techniques: Weighted Euclidean Distance (WED) with various configurations and shallow Artificial Neural Network (ANN) for tabular data, pre-trained and fine-tuned Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), and a multimodal approach combining both data types. The dataset, categorised into Excellent/Good and Fair/Poor outcomes, is organised into over 20K triplets for training and testing. Results show fine-tuned multimodal ViTs notably enhance performance, achieving up to 73.85% accuracy and 80.62% Adjusted Discounted Cumulative Gain (ADCG). This framework not only aids in managing patient expectations by retrieving the most relevant post-surgical images but also promises broad applications in medical image analysis and retrieval. The main contributions of this paper are the development of a multimodal retrieval system for breast cancer patients based on post-surgery aesthetic outcome and the evaluation of different models on a new dataset annotated by clinicians for image retrieval.
External IDs:doi:10.1007/978-3-031-77789-9_14
Loading