Improving Neoadjuvant Therapy Response Prediction by Integrating Longitudinal Mammogram Generation with Cross-Modal Radiological Reports: A Vision-Language Alignment-Guided Model

Yuan Gao; Hong-Yu Zhou; Xin Wang; Tianyu Zhang; Luyi Han; Chunyao Lu; Xinglong Liang; Jonas Teuwen; Regina Beets-Tan; Tao Tan; Ritse Mann

Improving Neoadjuvant Therapy Response Prediction by Integrating Longitudinal Mammogram Generation with Cross-Modal Radiological Reports: A Vision-Language Alignment-Guided Model

Yuan Gao, Hong-Yu Zhou, Xin Wang, Tianyu Zhang, Luyi Han, Chunyao Lu, Xinglong Liang, Jonas Teuwen, Regina Beets-Tan, Tao Tan, Ritse Mann

Published: 01 Jan 2024, Last Modified: 13 Nov 2024MICCAI (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Longitudinal imaging examinations are vital for predicting pathological complete response (pCR) to neoadjuvant therapy (NAT) by assessing changes in tumor size and density. However, quite-often the imaging modalities at different time points during NAT may differ from patients, hindering comprehensive treatment response estimation when utilizing multi-modal information. This may result in underestimation or overestimation of disease status. Also, existing longitudinal image generation models mainly rely on raw-pixel inputs while less exploring in the integration with practical longitudinal radiology reports, which can convey valuable temporal content on disease remission or progression. Further, extracting textual-aligned dynamic information from longitudinal images poses a challenge. To address these issues, we propose a longitudinal image-report alignment-guided model for longitudinal mammogram generation using cross-modality radiology reports. We utilize generated mammograms to compensate for absent mammograms in our pCR prediction pipeline. Our experimental result achieves comparable performance to the theoretical upper bound, therefore providing a potential 3-month window for therapeutic replacement. The code will be accessible to the public.

Loading