Improving Neoadjuvant Therapy Response Prediction by Integrating Longitudinal Mammogram Generation with Cross-Modal Radiological Reports: A Vision-Language Alignment-Guided Model

Published: 01 Jan 2024, Last Modified: 13 Nov 2024MICCAI (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Longitudinal imaging examinations are vital for predicting pathological complete response (pCR) to neoadjuvant therapy (NAT) by assessing changes in tumor size and density. However, quite-often the imaging modalities at different time points during NAT may differ from patients, hindering comprehensive treatment response estimation when utilizing multi-modal information. This may result in underestimation or overestimation of disease status. Also, existing longitudinal image generation models mainly rely on raw-pixel inputs while less exploring in the integration with practical longitudinal radiology reports, which can convey valuable temporal content on disease remission or progression. Further, extracting textual-aligned dynamic information from longitudinal images poses a challenge. To address these issues, we propose a longitudinal image-report alignment-guided model for longitudinal mammogram generation using cross-modality radiology reports. We utilize generated mammograms to compensate for absent mammograms in our pCR prediction pipeline. Our experimental result achieves comparable performance to the theoretical upper bound, therefore providing a potential 3-month window for therapeutic replacement. The code will be accessible to the public.
Loading