Keywords: survival analysis, multimodal learning, temporal attention, transformer, longitudinal data
TL;DR: MultiTimeSurv combines temporal attention mechanisms with transformer-based multimodal fusion to predict survival outcomes from longitudinal tabular data and images, outperforming existing methods while handling missing data effectively.
Abstract: Survival analysis requires modeling complex temporal dependencies and multimodal data to predict outcomes accurately. Existing state-of-the-art methods, such as Dynamic-DeepHit, have advanced temporal survival modeling but remain constrained to tabular data and cannot leverage multimodal information, leaving critical gaps in handling irregular sampling, heterogeneous modalities, and cross-modal alignment. In this way, we introduce MultiTimeSurv, a novel deep learning framework that integrates longitudinal tabular data with image analysis for dynamic survival prediction. Our approach addresses three key challenges: (1) capturing temporal evolution through attention-based recurrent networks, (2) processing multimodal data via specialized feature encoders for tabular embeddings and a transformer-based image analysis module, and (3) handling missing data patterns common in real-world settings. MultiTimeSurv employs contextual embeddings for categorical and continuous variables, a temporal attention mechanism for longitudinal modeling, and a fully transformer-based architecture for extracting visual-textual features from images. We evaluate MultiTimeSurv on multiple datasets, including hospitalization data, longitudinal studies, and multimodal image-text datasets, outperforming the current state-of-the-art survival analysis methods. On SYMILE-MIMIC, it consistently surpasses classical and neural baselines across all horizons, exceeding a C-index of 0.70 at long-term predictions.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 22202
Loading