\clearpage

\appendix

\input{relatedwork.tex}

\section{Details on the Experiments}

\subsection{Architectural Details of the FPN}
\label{sec:details}
\rev{Table \ref{tab:fpnconv} presents the architectural details employed in the proposed FPN for each of the experimented baseline pipeline.}

\input{tables/conv.tex}

\subsection{Details on the Application of \myalgoname}
\label{sec:expdetails}
\rev{
In this section, we give more details on how we applied the proposed \myalgoname\ strategy to the baselines MRNet \citep{MRNet} and ELNet \citep{ELNet}.
Following the details of MRNet's original paper for the MRNetData dataset,  we first trained a single-view model (i.e. for each of the axial, coronal, and sagittal views) with \myalgoname\ applied for each diagnostic task. Then, the probability predictions of the three models have been combined with a logistic regressor to obtain a single probability estimate of presence/absence of the knee disorder for each of the training exams. The optimization of the models has been performed using the original settings. The Adam optimizer \citep{Kingma2014} has been employed with a learning rate of $10^{-5}$ that was decreased by 0.3 in the presence of plateaux of the validation performance while executing 50 epochs. The original data augmentation strategy that applies random shift up to 25 pixels, random rotation up to 25 degrees, and random horizontal flip of an MRI exam, has been implemented.}
Similar optimization settings have been employed for the experiments on the KneeMRI dataset. In this case, we trained just an instance of MRNet with \myalgoname\ since this dataset offers just sagittal MRI scans. A weight decay with $10^{-2}$ factor was added to the optimization objective. All the MRI slices have been resized to $256\times256$ pixels. Data augmentations included random scale with a random factor in $[0.9, 1.1]$, random shift up to 40 pixels, and random rotation up to 25 degrees.

\rev{For the application of \myalgoname\ to ELNet's backbone, we trained an instance on the axial view for the ACL tear, and another on the coronal view for the meniscus tear tasks based on the MRNetData dataset. The optimization of the models has been performed using the original settings. The Adam optimizer \citep{Kingma2014} has been employed with a learning rate of $2^{-5}$ for the ACL tear task, and of $1.5^{-5}$ for the meniscal tear task. The training was run for 200 epochs. Layer normalization \citep{Ba2016} has been set for the ACL tear, and contrast normalization \citep{Ulyanov2016} for the detection of meniscus tear, both with the factor value $K$ equal to 4. The original data augmentation strategy has been applied, including random shift up to 25 pixels, random scaling with a random factor chosen in the range $[0.9, 1.1]$, random rotation up to 10 degrees, random horizontal flip, random rotation by a multiple of 90 degrees.}
For the experiments on the KneeMRI dataset, the SGD optimizer with a momentum of 0.9 has been employed with a learning rate of $5^{-5}$. Dropout layers have been added with a 0.5 factor as for the original ELNet. All the MRI slices have been resized to $256\times256$ pixels. The implemented data augmentation strategy included random scale with a random factor in $[0.9, 1.1]$, random shift up to 25 pixels, and random rotation up to 10 degrees.


\section{Statistical Tests}
\label{sec:stats}
We performed a McNemar's test \citep{McNemar} to assess the significance of the proposed methodology's performance. The p-value obtained from the sensitivity comparison between MRNet and MRNet with \myalgoname\ results in 0.0003 and 0.0005 for the ACL tear and meniscus tear respectively. The p-value for the sensitivity comparison between ELNet and ELNet with \myalgoname\ results in 0.0003 and 0.016 for the ACL tear and meniscus tear respectively.

\section{Details on compared Baselines that use Disorder's Information.}
\label{sec:mrnet}
We compare our \myalgoname\ to other baseline strategies that aim to extract information of the knee disorder anatomy.
The first baseline (first row of Table \ref{tab:baselines}) generates a list of sub-region sizes $(\width_\pyramidlevel, \height_\pyramidlevel), \pyramidlevels = 5$, at slice-level. These are used to crop each slice $\slice_i$ at its center coordinates. Each cropped slice is resized to $256\times256$ pixels and given to MRNet's AlexNet which produces $\pyramidlevels$ feature vectors (one for each cropped slice). These vectors are concatenated together
before the application of the series-wise max-pooling operation. After that, a fully connected layer predicts the probability of disorder presence/absence.
The second baseline (second row of Table \ref{tab:baselines}) just applies the proposed \modulename\ to the feature-maps $\featuremaps_{i,5}$ given by MRNet's AlexNet before the series-wise max-pooling.

\section{Additional Results}
\rev{
Since \myalgoname\ exploits features localized in particular areas of the MRI slices, we performed an experiment to assess its robustness to vertical and horizontal shifts of the slices, since these events could change the spatial localization of such features and consequently influence \myalgoname's capabilities. Specifically, we applied random vertical and horizontal translations up to the 20\% of the slice sizes (around 50 pixels) on the MRI scans belonging to the MRNetData validation set. In such a setting, \myalgoname\ applied to the baseline MRNet achieves a ROC-AUC and sensitivity of $0.967 \pm 0.009$ and $0.765 \pm 0.021$ for the ACL tear detection, and a ROC-AUC and sensitivity of $0.879 \pm 0.005$ and $0.840 \pm 0.044$ for the meniscal tear. These results are a little lower than the ones presented in Table \ref{tab:mrnetsota}, but remain higher than the ones of the baseline MRNet. We also performed the same experiment over the original MRNet. Such a baseline achieves a ROC-AUC and sensitivity of $0.951 \pm 0.001$ and $0.685 \pm 0.032$ for the ACL tear detection, and a ROC-AUC and sensitivity of $0.839 \pm 0.015$ and $0.718 \pm 0.044$ for the meniscal tear detection. Even these results are lower than the original baseline whose results are also available in Table \ref{tab:mrnetsota}. Overall, these outcomes show that part of the error committed by \myalgoname\ is inherited from the baseline architecture. Nevertheless, given the limited performance drop, we can state that \myalgoname\ is rather robust to the spatial location change of abnormalities.}

\section{Limitations}
\rev{Tables \ref{tab:mrnetsota} and \ref{tab:kneemrisota} show that the application of \myalgoname\ reduces the specificity in general. We think that this is due to \myalgoname's architecture which sets an inductive bias on better representing features related to abnormalities. 
For example, the employment of the FPN sets a bias over the exploitation of small appearing features. In this sense, \myalgoname\ is designed to exploit the information about the abnormalities only when these are actually present. Future work will be devoted to reduce the gap in the specificity.}
\rev{Moreover, we would like to remark that our proposed \modulename\ strategy should be considered as a generic baseline to implement a prior over the knee disorders' anatomy. Indeed, based on the observation that abnormalities appear in particular areas of the MRI slices, we demonstrated that our PDP module captures relevant information that leads to the improvement of the diagnoses. We think that more sophisticated strategies exploiting additional cues on the knee's anatomy could additionally enhance the results presented in this paper.}

