\begin{abstract}
Deep unsupervised generative models are regarded as a promising alternative to supervised counterparts in the field of MRI-based lesion detection. They denote a principled approach for detecting unseen types of anomalies without relying on large amounts of expensive ground truth annotations. To this end, deep generative models are trained exclusively on data from healthy patients and detect lesions as \ac{ood} data at test time (i.e. low likelihood). While this is a promising way of bypassing the need for costly annotations, this work demonstrates that it also renders this widely used unsupervised anomaly detection approach particularly vulnerable to non-lesion-based \ac{ood} data (e.g. data from different sensors). Since models are likely to be exposed to such \ac{ood} data in production, it is crucial to employ safety mechanisms to filter for such samples and run inference only on input for which the model is able to provide reliable results. We first show extensively that conventional, unsupervised anomaly detection mechanisms fail when being presented with true \ac{ood} data. Secondly, we apply prior knowledge to disentangle lesion-based \ac{ood} from their non-lesion-based counterparts.
\end{abstract}

%\begin{keywords}
%Unsupervised Lesion Detection, Out-of-Distribution Detection
%\end{keywords}


