\section{Results}

\paragraph{General Remarks.}
Table \ref{tab:mrnetsota} and Figure \ref{fig:roccurves} report on the MRNetData dataset performance using \myalgoname\ on the top of MRNet and ELNet pipelines.\footnote{See the Appendix \ref{sec:stats} for statistical tests.} 
Our proposed strategy improves the ACL tear and meniscus tear detection of the baselines, showing that focusing on specific features of the disorder's anatomy is of paramount importance to improve the diagnosis capabilities.
Specifically, for the ACL tear, the ROC-AUC improvement of \myalgoname\ over the MRNet baseline is 2.2\%, and 2.1\% over the ELNet baseline. For meniscal tear, using \myalgoname\ improves the score by 5.5\% and 2.9\% respectively.
More importantly, the proposed solution enhances the sensitivity of the methods, thus their ability to detect disorders when these are present. Indeed, we observe an improvement of 12.9\% and 28\% over MRNet and ELNet respectively in the ACL tear detection, and of 13.9\% and 7.1\% in the diagnosis of the meniscus tear.
The specificity decreases by a small margin in general, but it is consistent with the aim of \myalgoname\ which is to extract relevant information of the disorder's anatomy when this exists.
Similar conclusions can be reached for the results presented in Table \ref{tab:kneemrisota}, which show that the capabilities introduced by \myalgoname\ generalize well to similar tasks but with different data distributions. 

\begin{figure}[t]
\begin{minipage}[b]{.49\textwidth}
\centering
\includegraphics[width=.9\linewidth]{images/roccurves.pdf}
\caption{ROC curves showing the performance of the baseline methods with and without \myalgoname.}
\label{fig:roccurves}
\end{minipage}\hfill%
\begin{minipage}[b]{.49\textwidth}
\centering
\includegraphics[width=\linewidth]{images/views.pdf}
\caption{ \modulename's performance under different $\pyramidlevels$ settings for each of the MRI views. Dashed lines show the original MRNet's results.}
\label{fig:views}
\end{minipage}
\end{figure}






\paragraph{Analysis.}
In this section, we provide a study on the sensibility of \myalgoname\ to different configuration settings. 
The results presented in this paper have been obtained on the MRNetData dataset using the \myalgoname-enhanced MRNet with $\pyramidlevels = 5$.

\begin{table}[t]
\begin{minipage}[t]{.49\textwidth}
\input{tables/baselines2.tex}
\end{minipage}\hfill%
\begin{minipage}[t]{.49\textwidth}
\input{tables/ablation2.tex}
\end{minipage}
\end{table}

Table \ref{tab:baselines} reports the performance of our proposed strategy against MRNet which is executed on multiple sub-regions extracted at slice-level, and against MRNet with \modulename\ applied to its last layer's feature maps. \footnote{For the details on the compared baselines please see the Appendix \ref{sec:mrnet}.} \myalgoname's performance is higher, but the two results confirm that extracting particularly localized features of the knee disorder is relevant, as both strategies improve MRNet.

Figure \ref{fig:views} shows the performance of \myalgoname\ on different MRI views and for different values of $\pyramidlevels$. For the ACL tear, our solution improves MRNet for all $\pyramidlevel$ mainly on the coronal and sagittal views. 
For the detection of meniscal tears, results on axial and coronal views are always improved.
The performance on the sagittal axis increase by using larger $\pyramidlevels$ values.
In general, these results suggest that, to achieve better accuracy, the $\pyramidlevels$ values must be selected for the sub-regions sizes they generate rather than for the increased number.

In Table \ref{tab:ablation} we present an ablation study over \myalgoname's components. 
As a baseline (first row), we consider \myalgoname\ whose $\{\featuremaps_{i,l}\}_{l=1}^{L}$ are reduced by a global average pooling before performing the series-wise max pooling.
Employing just the FPN architecture improves the baseline, showing that it enables the exploitation of small-appearing features.
Introducing just the \modulename\ module also improves the baseline performance, demonstrating that the capabilities of \myalgoname\ are not just due to the increased expressive power given by the FPN's weights. 
The strongest strategy is achieved when both the two components are exploited.
\begin{table}[t]
\begin{minipage}[t]{.49\textwidth}
\input{tables/exitcomb2.tex}
\end{minipage}\hfill%
\begin{minipage}[t]{.49\textwidth}
\input{tables/pooling2.tex}
\end{minipage}
\end{table}
Table \ref{tab:exitcomb} shows that taking the max of the output probabilities given by \myalgoname\ results better than computing their average or combining them via a fully connected layer. 
Table \ref{tab:pooling} reveals that implementing $\poolfun(\cdot)$ as an avg rather than a max operation is a better choice for the detection of meniscal tear, whereas there is no particular difference for the detection of ACL tears.
This is consistent with the MRI appearance of such tears. Indeed, ACL tears cause higher signal \citep{Kam} that can captured also with max operations, while meniscus tears present more balanced signals \citep{Nguyen2014} that can be better summarized with avg operations.


