\section{Introduction}
Magnetic Resonance Imaging (MRI)-based brain lesion segmentation is vital for diagnosis, treatment planning, and longitudinal monitoring of neurological disorders~\cite{despotovic2015mri,sadegheih2025lhu}. However, clinical deployment faces substantial challenges due to the dynamic nature of clinical data~\cite{Kum_Continual_MICCAI2024,kumari2025domain,kumari2025attention}. Modality availability, acquisition protocols, and pathology distributions vary widely, and new imaging sequences or auxiliary information continue to emerge. Different hospitals and research centers produces cohorts with diverse modality configurations, pathology, and patient populations, yielding highly heterogeneous MRI datasets.
Conventional U-Net based models are typically trained for specific modality-pathology combinations, necessitating retraining or separate cohort-specific models, a practice that is resource-intensive and restricts generalization. Alternatively, joint training approaches aim to learn a single model that handles variable modality configurations~\cite{xu2024feasibility,zhang2025foundation}, but they require simultaneous access to all datasets, which is rarely feasible due to privacy, acquisition timing, and storage constraints. These models also degrade when test cohorts exhibit modality or pathology distributions not seen during training, often necessitating retraining.

A more realistic setting is to learn the datasets sequentially as they become available. Naively updating a U-Net in this scenario leads to Catastrophic Forgetting (CF), where performance on earlier datasets deteriorates sharply~\cite{chen2022continual}. Continual Learning (CL) offers a framework to alleviate this problem by enabling models to integrate new information while retaining prior knowledge~\cite{kumari2025continual}. However, CL methods developed for natural images often underperform in medical segmentation, where dense prediction significantly amplifies forgetting~\cite{gonzalez2023lifelong}. Lifelong U-Net~\cite{gonzalez2023lifelong} demonstrated that buffer-free approaches struggle to maintain segmentation accuracy and perform substantially worse than joint or cumulative training. Rehearsal-based methods, which replay a small subset of past samples, offer stronger retention but still achieve limited backward transfer. While experience replay is highly effective in natural image and audio domains~\cite{rolnick2019experience,bhatt2024characterizing}, its exploration under heterogeneous-modality MRI settings remains limited. Beyond mitigating forgetting, continual brain MRI segmentation must also accommodate variable modality across datasets. Different cohorts are acquired with differing sets and counts of modalities, and future acquisitions may introduce modalities unseen in earlier stages. A recent modality-agnostic framework~\cite{sadegheih2025modality} demonstrated the feasibility of continual segmentation under heterogeneous modality inputs, yet CF remained substantial. Moreover, the framework assumes prior knowledge of the maximum number of modalities across all datasets, which prevents seamless extension to cohorts containing novel or auxiliary sequences. As imaging protocols continue to evolve, such constraints limit clinical applicability. A continual segmentation system must therefore support arbitrary and evolving modality combinations while maintaining robust performance across sequential domains.



We propose the Continual Learning in Modality-agnostic U-Net (CLMU-Net), a replay-based framework designed for brain lesion segmentation under dynamic and heterogeneous MRI conditions. CLMU-Net integrates a lesion-aware replay buffer with lightweight textual conditioning that encodes concise descriptions of modality availability and lesion characteristics. Cross-attention injects these domain-aware cues into bottleneck features, guiding the network toward domain-appropriate representations. A conceptually simple yet effective channel-inflation mechanism enables arbitrary modality subsets without requiring a predefined maximum set, allowing seamless adaptation as new cohorts or modalities appear. Together, these components allow CLMU-Net to learn sequentially with substantially reduced forgetting. We evaluate CLMU-Net on five diverse brain MRI datasets spanning different lesion types, modality configurations, and acquisition centers. Across two dataset-order permutations, CLMU-Net consistently outperforms both buffer-free and rehearsal-based baselines, yielding higher Dice scores and improved stability under variable-modality conditions. Our key contributions are as follows: \ding{182} We introduce a lesion-aware replay buffer that prioritizes structurally informative and uncertain samples, improving knowledge retention under strict buffer budgets. \ding{183} We develop a domain-conditioned textual guidance mechanism that injects global modality and lesion cues into bottleneck features through cross-attention. \ding{184} We propose a modality-flexible input mechanism based on channel inflation that supports arbitrary and previously unseen modality combinations without predefined limits. \ding{185} We demonstrate consistent performance gains across five heterogeneous 3D brain MRI datasets and multiple sequential training orders, establishing CLMU-Net as a strong framework for continual brain lesion segmentation under real clinical variability.






