\documentclass{midl} 
\makeatletter
\def\set@curr@file#1{\def\@curr@file{#1}} %temp workaround for 2019 latex release
\makeatother
\usepackage{mwe} 
\usepackage{float}
\jmlryear{2020}
\jmlrworkshop{Full Paper -- MIDL 2020}


\title[Training segmentation networks on texture-encoded input]{Training deep segmentation networks on texture-encoded input: application to neuroimaging of the developing neonatal brain}


\midlauthor{\Name{Ahmed E. Fetit\nametag{$^{}$}} \Email{a.fetit@imperial.ac.uk}\\
\Name{John Cupitt\nametag{$^{}$}}
\Email{j.cupitt@imperial.ac.uk}\\
\Name{Turkay Kart\nametag{$^{}$}}
\Email{t.kart@imperial.ac.uk}\\
\Name{Daniel Rueckert\nametag{$^{}$}} \Email{d.rueckert@imperial.ac.uk}\\
\\
\addr Biomedical Image Analysis Group, Department of Computing, Imperial College London, London, SW7 2AZ, United Kingdom.\\
}
\begin{document}
\maketitle
\begin{abstract}
Standard practice for using convolutional neural networks (CNNs) in semantic segmentation tasks assumes that the image intensities are directly used for training and inference. In natural images this is performed using RGB pixel intensities, whereas in medical imaging, e.g. magnetic resonance imaging (MRI), gray level pixel intensities are typically used. In this work, we explore the idea of encoding the image data as local binary textural maps prior to the feeding them to CNNs, and show that accurate segmentation models can be developed using such maps alone, without learning any representations from the images themselves. This questions common consensus that CNNs recognize objects from images by learning increasingly complex representations of shape, and suggests a more important role to image texture, in line with recent findings on natural images. We illustrate this for the first time on neuroimaging data of the developing neonatal brain in a tissue segmentation task, by analyzing large, publicly available T2-weighted MRI scans (n=558, range of postmenstrual ages at scan: 24.3 - 42.2 weeks) obtained retrospectively from the \textit{Developing Human Connectome Project} cohort. Rapid changes in visual characteristics that take place during early brain development make it important to establish a clear understanding of the role of visual texture when training CNN models on neuroimaging data of the neonatal brain; this yet remains a largely understudied but important area of research. From a deep learning perspective, the results suggest that CNNs could simply be capable of learning representations from structured spatial information, and may not necessarily require conventional images as input. 
\end{abstract} 

\begin{keywords}
Segmentation, convolutional neural networks, local binary patterns, texture, neuroimaging, neonatal, developing brain.
\end{keywords}

\section{Introduction}
One widely accepted explanation of the effectiveness of convolutional neural networks (CNNs) in classification and semantic segmentation tasks is the so-called \textit{shape hypothesis} \cite{geirhos2018imagenettrained}; low-level shape features are combined in increasingly complex hierarchies until the object can be readily classified or detected \cite{lecun2015deep}. Whilst this hypothesis is supported by a number of empirical findings \cite{zeiler2014visualizing, ritter2017cognitive}, recent work in the machine learning literature suggests an important role for visual \textit{texture} in object recognition tasks. For instance, Brendel and Bethge showed that CNNs can achieve high classification accuracy on the publicly available ImageNet \cite{ILSVRC15} data in settings where the model is effectively constrained to recognizing local textural patches \cite{brendel2018approximating}. Recent analysis by Geirhos and colleagues also supports this claim, and illustrates that ImageNet-trained CNNs are strongly biased towards the recognition of textural representations as opposed to shapes; an observation termed \textit{texture hypothesis} by the authors \cite{geirhos2018imagenettrained}. A preliminary study by Pawlowski and Glocker also supports the texture hypothesis in the context of medical imaging, and shows that textural information alone could indeed be sufficient for regression and classification tasks when using T1-weighted brain MRI \cite{pawlowski:MIDLAbstract2019a}.

We contribute to the ongoing debate on the role of texture in deep learning within the context of neuroimaging of the developing brain. This is an important medical application; visual characteristics of neonatal and fetal brains are significantly different from those of adult brains in terms of size, morphology, and white/gray matter intensities \cite{Makropoulos2018TheReconstruction} \textit{à la} visual shape and texture. Importantly, changes in visual characteristics occur rapidly during brain development as a result of the continuous decrease of water content within the brain and the process of myelination \cite{serag2013spatio}; Figure \ref{fig:axialslices}.

\begin{figure}[h]
  \centering
  \includegraphics[width=0.8\linewidth]{neonatal-brain-t2.png}  \caption{Axial slices from T2-weighted brain MRI scans of neonates, postmenstrual ages at scan: 32, 34, 45, 38, and 40 weeks, respectively. Scans were obtained from the publicly available \textit{dHCP} neonatal data set. One can see changes in visual texture rapidly taking place throughout brain development, e.g. as the brain matures, occurrence of dark intensities present in white matter regions gradually increases.}
\label{fig:axialslices}
\end{figure}

Studies in the neuroimaging literature suggest that changes in brain texture that take place during development are actually quantifiable. For instance, the developmental model presented by Towes et al. \cite{toews2012feature} hypothesized the existence of distinctive anatomical properties that can be localized in space and time, and can be used to represent structural development in neonatal MRI; conventional scale-invariant features proposed in \cite{lindeberg1998feature} and \cite{lowe2004distinctive} were used to successfully achieve this modeling. In fact, 3D scale-invariant features were recently shown to be incredibly powerful in capturing a key-point signature `brainprint' that could identify similarities in scans corresponding to unique adult subjects despite ageing and neurodegenerative disease progression \cite{chauvin2020neuroimage}, suggesting that both variations and consistencies in brain texture are quantifiable. In addition to modeling the healthy brain, quantifying image textural patterns has shown success in the characterisation of pathology, as reported in a number of studies in paediatric neuro-oncology \cite{gutierrez2014metrics, orphanidou2014texture, fetit2015three}. It is therefore important to establish a clear understanding of the role of visual texture when training deep learning models on neuroimaging data of the developing brain.

Our analysis empirically shows that deep CNNs can be trained to a high level of accuracy on complex neuroimaging segmentation tasks without being exposed to the underlying imaging data, but rather to explicit representations of the image's texture. We achieve this by encoding the imaging data as visual textural maps using the computationally simple \textit{Local Binary Patterns (LBP)} algorithm \cite{ojala2002multiresolution}. We then train, validate, and test the networks directly on the resulting encoded maps, in a tissue segmentation problem. To this end, we use publicly available T2-weighted MRI scans of the developing neonatal brain obtained retrospectively from the \textit{Developing Human Connectome Project \footnote{\href{www.developingconnectome.org}{developingconnectome.org}} (dHCP)} cohort \cite{Makropoulos2018TheReconstruction, bastiani2019automated}. 

\section{Contribution and overview}

This study is not the first to incorporate aspects of LBP with deep networks; work in the remote sensing literature looked into using a two-stream strategy for designing a CNN, where texture-encoded images were used as an additional stream that is fused with a standard RGB image pathway \cite{anwer2018binary}. Work in computer vision illustrated that CNNs could be trained directly on LBP textural maps to achieve high levels of accuracy on face recognition tasks \cite{zhang2017face}. State-of-the-art work proposed the notion of local binary pattern \textit{networks}, which uses binary operations as opposed to convolutions, and illustrated its utility on optical character recognition tasks \cite{lin2019local}. 

To the best of our knowledge, however, our analysis is the first  to demonstrate that CNNs could be used directly on explicit LBP textural maps in the complex task of image segmentation of developing human brain tissues, and on neuroimaging datasets in general. By developing accurate tissue segmentation models on explicit textural maps, the work offers two contributions to the fields of neuroimaging and deep learning:
\begin{itemize}
\item Firstly, it takes a step towards understanding the role of visual texture when training deep segmentation CNNs on heterogeneous neuroimaging data of the developing brain; an important area of research that has not been previously explored.
\item Secondly, it contributes to the understanding of the inner workings of CNNs by showing empirical results that support the texture hypothesis, in line with recent findings on natural images. Evaluating these results suggests that CNNs do not necessarily require conventional images as input, and they may simply be capable of learning representations from well-structured spatial information.
\end{itemize}

In this regard, it is important to stress that our focus is not necessarily on improving segmentation accuracy using texture-encoding in this particular study, but rather to show that it is possible to achieve good performance using only texture-encoded maps as input to the CNNs for this complex neuroimage segmentation task.   

\section{Materials and Methods}
\subsection{Image acquisition and pre-processing} 
558 three-dimensional, T2-weighted MRI scans were obtained retrospectively from the publicly available \textit{Developing Human Connectome Project (dHCP)} neonatal cohort. Acquisition was carried out using a 3T Philips scanner and following a protocol described in \cite{Makropoulos2018TheReconstruction}. Data was available in NIfTI format. Normalization of gray-level intensities was carried out by ensuring zero-mean and unit-variance within each scan. The scans have associated tissue labels that were generated using an automated segmentation pipeline. The pipeline was specifically designed for neonatal brain MRI using the well-established \textit{Draw-EM} \cite{Makropoulos2014AutomaticBrain} framework and was discussed in \cite{Makropoulos2018TheReconstruction}. The labels were used as ground truth annotations and can be summarized as follows:
1. background (zero-intensity pixels), 2. cerebrospinal fluid (CSF), 3. cortical gray matter (cGM), 4. white matter (WM), 5. background bordering brain tissues, 6. ventricles, 7. cerebellum, 8. deep gray matter (dGM), 9. brainstem, and 10. hippocampus.

\subsection{Structuring the dataset}
Model-development set: 470 neonatal scans were included with the purpose of developing and optimizing models capable of segmenting developing brain tissues. Of the 470 scans, 450 were assigned for model training and 20 were used for validation throughout the training cycles. Subjects' postmenstrual age range was 24.7-42.1 weeks for the training data, and 27.6-42.2 weeks for the validation data. Held-out test set: 88 additional scans were completely held out from the model-development set; postmenstrual range was 24.3-42 weeks.

\subsection{Generating LBP texture maps}
In essence, the \textit{LBP} algorithm assumes that the visual texture of an image can be characterized using two complementary measures: local spatial patterns and gray-level contrast \cite{Pietikainen:2010}. \textit{LBP} is intensity invariant and computationally simple. It first considers the neighborhood of a given pixel of interest; variations in pixel intensities and positions then generate a systematic code that summarizes the local texture within the given pixel's neighborhood \cite{ojala2002multiresolution}. By computing a pixel-wise LBP code across the image a local-texture map can be produced, where every unit of the map is a representation of the spatial variation of intensities in the corresponding pixel's local neighborhood on the original image. The LBP code for a given pixel of interest can be formulated as:

\begin{equation}
LBP(x_{c},y_{c})=\displaystyle\sum\limits_{p=0}^{P-1} \emph{f}(i_{p}-i_{c})2^p,
\label{eq:LBPcode}
\end{equation}
\noindent where $P$ is the number of sampling points, $i_{c}$ is the gray-level intensity of the pixel of interest defined by coordinates ($x_{p}, y_{c}$), and $i_{p}$ is the gray-level intensity of the \textit{p}th surrounding pixel. The binary pattern $f(x)$ is straightforward to compute:
\\
\begin{equation}
f(x)  =
\left\lbrace \begin{array}{ccc}
1 & if & x\geq0 \\
0 & , & otherwise \\
\end{array}. \right.
\label{eq:lbpfx}
\end{equation}

We applied uniform LBP operators to the original (pre-normalization) MRI scans using a 3x3 pixel neighborhood offset\footnote{Code is \href{https://www.github.com/afetit/lbp-encoding}{publicly available on Github}.}. Two versions of the maps were computed using radius values of 1 and 10 pixels, respectively. \textit{Scikit-image}'s local\_binary\_pattern module \cite{van2014scikit} was used to carry out the computations. The output of the algorithm is an LBP map that has the same dimensions as the input image; see Figure \ref{fig:lbpexamples}. 

\begin{figure}[h]
  \centering
  \includegraphics[width=0.5\linewidth]{t2andlbp.png}
  \caption{(a) Example axial 
  slices from a T2-weighted scan in the \textit{dHCP} neonatal cohort, (b) LBP map computed using a 1-pixel radius, (c) LBP maps computed using a 10-pixel radius, and (d) 10-class tissue labels generated by the \textit{dHCP} structural pipeline and used as ground-truth for training segmentation models.}
  \label{fig:lbpexamples}
\end{figure}

\subsection{Training deep segmentation CNNs}
We used the open-source \textit{DeepMedic} framework (v0.7.0) \cite{Kamnitsas2017EfficientSegmentation} to train 10-class tissue segmentation networks using the tissue labels generated by the \textit{dHCP} structural pipeline as ground truth. \textit{DeepMedic} uses multiple complementary pathways; the primary one encodes local features, whereas the additional ones capture higher-level contextual information \cite{Kamnitsas2017EfficientSegmentation}. In our set-up, the primary pathway had 8 layers, and each layer used a kernel dimension of 3x3x3. Residual connections were also used between the following layer pairs: 4 and 3, 5 and 6, 7 and 8. Two parallel sub-sampling pathways were used (8-layers deep); sub-sampling factors of [3, 3, 3] and [5, 5, 5] were applied along the [x, y, z] axes, respectively. Each of the training cycles comprised 100 epochs, each consisting of 20 sub-epochs. In every sub-epoch, images were loaded from 5 cases, and 1000 segments were extracted in total. Training batch size was set to 5. Initial learning rate was set to 0.001 and was halved at predefined points using a scheduler (epochs 17, 22, 27, 32, 37, 42, 47, 52). Changing the mean and standard deviation of training samples was carried out in order to augment the dataset. Training was accelerated using an NVIDIA Tesla K80 graphics processing unit (GPU). In order to quantify segmentation performance, Dice similarity coefficient (DSC) was computed. When training directly on gray level intensities, the normalized MRI scans were used. 

\section{Results and Discussion}
\subsection{Results on validation data}
 When applied on the 20 full scans in the validation set, the model trained directly on gray level intensities achieved the following per-class DSC values across all subjects: 
 \\
 \\
 DSC: [ 0.9903, 0.8954, 0.9207, 0.9356, 0.8765, 0.7719, 0.8893, 0.9243, 0.9058, 0.7470 ].
 \\
 \\
 This made use of labels generated by the well-validated \textit{dHCP} structural pipeline in place of ground truth annotations. The DSC values corresponded to the following 10 classes: 
 \\
 \\
 Classes: [ zero-intensity background, CSF, cGM, WM, background bordering brain tissues, ventricles, cerebellum, dGM, brainstem, hippocampus ].
 \\
 \\
 Evaluating the validation DSC values computed from output of the CNN trained on 10-pixel radius LBP maps showed that it achieved comparably high segmentation performance on the texture-encoded version of the data, specifically for the zero-intensity background, CSF, cGM, WM, ventricles, cerebellum, and dGM classes; the DSC values were only 1\%-4\% lower than those achieved with the T2-weighted scans. Performance on the background bordering brain tissues, brainstem and hippocampus tissues was also good, but lower by 5\% - 11\% than that achieved using the T2-weighted scans: 
 \\
 \\
 DSC: [ 0.9841, 0.8776, 0.8804, 0.9160, 0.8285, 0.7443, 0.8643, 0.8873, 0.8476, 0.6374 ].
 \\
 \\
 \textit{Vis-à-vis} the CNN trained on 1-pixel radius maps, the network still maintained high DSC values on validation data, albeit showing a substantial drop for the brainstem and hippocampus tissue classes: 
 \\
 \\
 DSC: [ 0.9781, 0.8573, 0.8865, 0.9128, 0.7867, 0.6680, 0.7034, 0.7692, 0.5408, 0.0026 ]. Note that for all three CNNs, inference was carried out on corresponding version of the data, e.g. the network trained on 1-pixel radius maps was validated on the 1-pixel radius map versions of the validation-set.

\subsection{Results on held-out test data}
We then evaluated the performance of the three models on 88 volumes in a completely held-out test set (example, see Figure \ref{fig:test-time-1}). The results showed that all three CNNs achieved high DSC values on data completely unseen by the models before test time, with the exception of the brainstem and hippocampus tissue classes when 1-pixel LBP map CNN was used:
\\
\\
Using gray level intensities:
\\
DSC: [ 0.9919, 0.9196, 0.9376, 0.9525, 0.8921, 0.8043, 0.9319, 0.9357, 0.9183, 0.7804 ].
\\
Time for testing process: 11,193 seconds.
\\
\\
Using 10-pixel radius LBP maps:
\\
DSC: [ 0.9869, 0.8825, 0.8949, 0.9221, 0.8458, 0.7610, 0.8954, 0.8926, 0.8377, 0.6505 ].
\\
Time for testing process: 10,807 seconds.
\\
\\
Using 1-pixel radius LBP maps:
\\
DSC: [ 0.9823, 0.8688, 0.9038, 0.9232, 0.8104, 0.6692, 0.7435, 0.7894, 0.5319, 0.0019 ].
\\
Time for testing process: 11,101 seconds.

\subsection{Discussion}
Visual inspection of the results suggested that the CNN trained on 10-pixel radius LBP maps resulted in relatively more smooth segmentation maps compared to the one trained with 1-pixel maps; this indicates an element of trade-off between the complexity of the computed LBPs and how refined the segmentation output is. Additionally, the drop in performance on the hippocampus and brainstem classes when using 1 pixel radius as opposed to 10 suggests that the choice of granularity of the textural maps also has a direct effect on capturing changes in intensity within the classes of interest.  Nevertheless, the performance of all three networks appeared invariant to PMA at scan, despite differences in texture and shape patterns across  scans that belong to different ages.

\begin{figure}[h]
  \centering
  \includegraphics[width=0.7\linewidth]{test-time-1.png}
  \caption{Example segmentation performance on (a) T2-weighted axial slice, using CNNs trained with (b) gray level intensities, (b) LBP maps computed using a 10-pixel radius, and (d) LBP maps computed using a 1-pixel radius. }
  \label{fig:test-time-1}
\end{figure}


To the best of our knowledge, the findings are the first to show that deep CNNs could be trained directly on textural maps to achieve successful segmentation performance on the highly heterogeneous neonatal brain MRI data; these findings are also the first to show this on neuroimaging datasets in general. From a neuroimaging perspective, this is a crucial first step towards understanding the role of visual texture when training deep segmentation models on datasets of the developing brain. From a deep learning perspective, the findings question common consensus that CNNs perform well in computational perception tasks by learning complex shape hierarchies, and suggest a more important role to texture, at least in semantic segmentation tasks. Additionally, and since each unit on an LBP map is a representation of the textural neighborhood for the corresponding pixel in the original image, achieving segmentation success on LBP maps suggests that CNNs could simply be capable of learning representations from structured information, and do not necessarily require conventional images as input; an interesting area to explore in future work. 

In terms of future work, it will also be interesting to explore whether the findings can generalize to other measures of texture, or whether they are specific to the LBP algorithm. Additionally, exploring the use of shape filters will be a natural next step. Further, it will be interesting to vary the complexity of the segmentation task by experimenting with more detailed segmentation maps that are also publicly available from \textit{dHCP}, as exploring the relationship between the complexity of the segmentation task and the radius of the textural neighbourhood could give further interesting insights. 

Having explicit textural and shape inputs to a CNN directly links to an active area of research referred to as representation disentanglement; please refer to the work by van Steenkiste for an empirical study on the topic \cite{van2019disentangled}, albeit not on textures. If disentangling the type of representations learned by complex networks could be carried out easily, this may have a direct impact on model performance when the training data is perturbed or when the application domain is shifted,  potentially resulting in improved robustness against changes in gray level intensities, image acquisition protocols, or scanner hardware. Exploring this application within the context of developing brain MRI will drive our further future efforts. 

\section{Conclusion}
We illustrated the feasibility of training deep segmentation CNNs on texture encoded input, using the computationally simple LBP algorithm, on heterogeneous MRI scans of the developing neonatal brain. 

% Acknowledgments---Will not appear in anonymized version
\midlacknowledgments{The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement no. 319456. We are grateful to the families who generously supported this trial. This research was also supported by the UK Research and Innovation London Medical Imaging and Artificial Intelligence Centre for Value Based Healthcare.}
\bibliography{Fetit20}



\end{document}