\section{Introduction}

Segmentation of brain MR scans is an important task in neuroimaging, as it is a primary step in a wide array of subsequent analyses such as volumetry, morphology, and connectivity studies. Despite the success of modern supervised segmentation methods, especially convolutional neural networks (CNN), their adoption in neuroimaging has been hindered by the high variety in MRI contrasts. These approaches often require a large set of manually segmented preprocessed images \textit{for each} desired contrast. However, since manual segmentation is costly, such supervision is often not available.  A straightforward solution, implemented by widespread neuroimaging packages like FreeSurfer \cite{fischl_freesurfer_2012} or FSL \cite{jenkinson_fsl_2012}, is to require a 3D, T1-weighted scan for every subject, which is aggressively preprocessed, then used for segmentation purposes. However, such a requirement precludes analysis of datasets for which 3D T1 scans are not available.

Robustness to MRI contrast variations has classically been achieved with Bayesian methods. These approaches rely on a generative model of brain MRI scans, which combines an anatomical prior (a statistical atlas) and a likelihood distribution. The likelihood typically models the image intensities of different brain regions as a Gaussian mixture model (GMM), as well as artifacts such as bias field. Test scans are segmented by ``inverting'' this  generative model using Bayesian inference. 
If the GMM parameters are independently derived from each test scan in an unsupervised fashion \cite{van_leemput_automated_1999,zhang_segmentation_2001,ashburner_unified_2005}, this approach is fully adaptive to MRI contrast. In some cases, \emph{a priori} information is included in the parameters, which constrains the method to a specific contrast \cite{wells_adaptive_1996,fischl_whole_2002,patenaude_bayesian_2011} -- yet even these methods are generally robust to small contrast variations.  Such robustness is an important reason why Bayesian techniques remain at the core of all major neuroimaging packages, such as FreeSurfer, FSL, or SPM~\cite{ashburner_spm_2012}. However, these strategies require significant computational resources (tens of minutes per scan) compared to deep learning methods, limiting large-scale deployment or time-sensitive applications.

Another popular family of neuroimaging segmentation methods is multi-atlas segmentation (MAS)~\cite{rohlfing_evaluation_2004,iglesias_multi-atlas_2015}. In MAS, several labeled scans (``atlases'') are registered to the test scan, and their deformed labels are merged into a final segmentation with a label-fusion algorithm~\cite{sabuncu_generative_2010}. MAS was originally designed for intra-modality problems, but can be extended to cross-modality by using multi-modality registration metrics like mutual information~\cite{wells_adaptive_1996,maes_multimodality_1997}. However, their performance in this scenario is poor, due to the limited accuracy of nonlinear registration algorithms across modalities~\cite{iglesias_is_2013}. Another main drawback of MAS has traditionally been the high computational cost of the multiple nonlinear registrations. While this is quickly changing with the advent of fast, deep learning based registration techniques~\cite{balakrishnan_voxelmorph_2019,de_vos_end--end_2017}, accurate deformable registration for arbitrary modalities has not been widely demonstrated with these methods.

The modern segmentation literature is dominated by CNNs~\cite{milletari_v-net_2016,kamnitsas_efficient_2017}, particularly the U-Net architecture~\cite{ronneberger_u-net_2015}. Although CNNs produce fast and accurate segmentations when trained for modality-specific applications, they typically do not generalize well to image contrasts which are different from the training data~\cite{akkus_deep_2017,jog_pulse_2018,karani_lifelong_2018}. A possible solution is to train a network with multi-modal data, possibly with modality dropout during training~\cite{havaei_hemis_2016}, although this assumes access to manually labeled data on a wide range of acquisitions, which is problematic. One can also augment the training dataset with synthetic contrast variations that are not initially available from uni- or multi-modal scans~\cite{chartsias_multimodal_2018,huo_synseg-net_2019,kamnitsas_unsupervised_2017,jog_pulse_2018}. Recent papers have also shown that spatial and intensity data augmentation can improve network robustness~\cite{chaitanya_semi-supervised_2019,zhao_data_2019}. Although these approaches make segmentation CNNs adaptive to brain scans of observed contrasts, they remain limited to the modalities (real or simulated) present in the training data, and thus have reduced accuracy when tested on previously unseen MR contrasts.

To address modality-agnostic learning-based segmentation, a CNN was recently used to quickly solve the inference problem within the Bayesian segmentation framework~\cite{dalca_unsupervised_2019}. However, this method cannot be directly used to segment test scans of arbitrary contrasts, as it requires training on a set of unlabeled, preprocessed scans for each target modality.

We present \netname{}, a novel learning strategy that enables automatic segmentation of \textit{unpreprocessed} brain scans of \emph{any} MRI contrast without any need for paired training data, re-training, or fine tuning. We train a CNN using a dataset of only segmentation maps: synthetic images are produced by sampling a generative model of Bayesian segmentation, conditioned on a segmentation map. By sampling model parameters randomly at every mini-batch, we expose the CNN to synthetic (and often unrealistic) contrasts during training, and force it to learn features that are inherently contrast agnostic. We demonstrate \netname{} on four different MRI contrasts. We also show that \netname{} generalizes across datasets of the same contrast better than a CNN trained on real images of this contrast from a specific dataset.