\section{Experimental Settings}

\subsection{Applications}
We applied the \myalgoname\ architecture into the pipelines of the state-of-the-art MRNet \citep{MRNet} and ELNet \citep{ELNet} methods. For both, we kept the original pipeline configurations and hyper-parameters settings, in training and inference.

For MRNet, we considered the features $\{\featuremaps_{i,l}\}_{l=1}^{L}$ as the output of AlexNet's max-pool1, max-pool2, conv3, conv4, conv5 layers respectively. Each lateral connection has been implemented by a 1x1 convolutional layer with ReLU activation, and the convolutional module after the up-sampling layer has been implemented by a convolutional layer with kernel size equal to 3, and ReLU activation.\footnote{\label{note1}\rev{Further details are given in the Appendix \ref{sec:details}.}}
Each $\{ \pooledvec_l\}_{l=1}^{L}$ is given to an independent fully connected layer which has a single output node and sigmoid activation. 
Values for $\pyramidlevels$ have been configured for different MRI views after the results shown in Figure \ref{fig:views}.
The sample-distribution-weighted binary cross entropy loss has been used for $\loss(\modeloutput_l, \groundtruth)$.

For ELNet, $\{\featuremaps_{i,l}\}_{l=1}^{L}$ have been considered as, respectively, the output of the first, second, third, fourth, blur-pool layers, and the output of the convolutional layer before the last blur-pooling layer. 
Each lateral connection has a 1x1 convolutional layer with ReLU activation, while
we implemented the convolutional module after the up-sampling layer as a convolutional layer with kernel size equal to 3, ReLU activation, and the original ELNet's normalization layer.$^{\ref{note1}}$ 
Based on experiments, $\pyramidlevels$ has been set to 6 and 7 for the ACL and meniscus tear tasks, respectively.
Each output $\{ \pooledvec_l\}_{l=1}^{L}$ is given as input to a fully connected layer with two output nodes and a softmax activation. We used the standard cross entropy loss for $\loss(\modeloutput_l, \groundtruth)$ as done by ELNet.





\subsection{Datasets}

\paragraph{MRNet Dataset.}
The MRNet dataset \citep{MRNet} (which we refer to as MRNetData) is the largest public knee MRI dataset currently available. It consists of 1370 knee MRI manually curated examinations performed at the Stanford University Medical Center in a 12-year period. Each case contains axial (proton density-weighted series), coronal (T1-weighted series), and sagittal (T2-weighted series) MRI scans obtained with GE machines. Each exam was assigned a label according to the presence/absence of ACL tear, meniscal tear, or general abnormalities that are not the before mentioned (we considered just the first two in this work). 
The exams were randomly split by the authors into 1130 training exams (1088 patients), 120 validation exams (111 patients), and other 120 test exams (113 patients), by making sure that each split contained at least 50 cases for each pathology. 
Each MRI slice is of size $256 \times 256$ pixels and their number in the sequences ranges between 17-61 (mean 31 and standard deviation 7.97).


\paragraph{kneeMRI Dataset.}
The kneeMRI dataset was acquired by \citet{kneeMRI} at the Clinical Hospital Centre Rijeka, Croatia, from 2007 until 2014.
It contains 917 sagittal proton density-weighted exams obtained with a Siemens Avanto 1.5-T scanner. The authors, following radiologist reports, assigned each exam a label according to the level of ACL disorder: non-injured (690 exams), partially injured (172 exams), and completely ruptured (55 exams). 
Each MRI slice is of size $320 \times 320$ or $290 \times 300$ pixels, and the number of images in each series ranges in 21-45 (mean 31 and standard deviation 2.27).
For this dataset, as done by \citet{MRNet}, we considered the classification task of discriminating between non-injured ACLs and injured ACLs.

\subsection{Performance Evaluation and Measures}
The performance evaluation on MRNetData dataset has been executed on the validation set (since the test set is sequestered) after optimizing the model on the training set.
For the kneeMRI dataset, performance was assessed through  
a 5-fold cross-validation procedure by considering, in each fold, 80\% of the exams for training and the remaining 20\% for validation.
As quantitative measures, we used the area-under-the-curve of the receiver operating characteristics (ROC-AUC), the accuracy, and the sensitivity and specificity obtained after thresholding the probability predictions at 0.5.
\rev{Each experiment has been run three times with the different random seeds (the same three values for each experiment)}. We report on the mean and standard deviation for each metric.

\input{tables/sota2.tex}

\input{tables/sotakneemri.tex}


\subsection{Implementation Details}

Code\footnote{\codelink} was implemented in Python with the PyTorch \citep{PyTorch} and scikit-learn \citep{scikitlearn} machine learning frameworks. 
MRNet and ELNet have been implemented using the code published by the authors and by following the details of the respective papers.
A machine with an Intel Xeon E5-2690 v4 @ 2.60GHz CPU, 320 GB of RAM, and an NVIDIA TITAN V GPU has been employed to run the experiments.

