\documentclass{midl} % Include author names
%\documentclass{article}
%\documentclass[anon]{midl} % Anonymized submission

% The following packages will be automatically loaded:
% jmlr, amsmath, amssymb, natbib, graphicx, url, algorithm2e
% ifoddpage, relsize and probably more
% make sure they are installed with your latex distribution
\usepackage{footnote}
\usepackage{tabularx}
\usepackage{graphicx}
\usepackage{multirow}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{algorithm2e}
\usepackage{hyperref}
\usepackage{makecell}
% \usepackage[numbers]{natbib}

%\makesavenoteenv{tabular}
%\makesavenoteenv{table}
%\usepackage{mwe} % to get dummy images
%\usepackage{makecell}
%\usepackage{multirow}

%\jmlrvolume{-- Under Review}
%\jmlryear{2021}
%\jmlrworkshop{Full Paper -- MIDL 2021 submission}
%\editors{Under Review for MIDL 2021}
\jmlryear{2021}
\jmlrworkshop{Full Paper -- MIDL 2021}

\newcommand{\hl}[1]{\textcolor{red}{{[PP: #1]}}}

\newcommand{\etal}{{\em et al.}\,}
\newlength\mylen
\newcommand\myinput[1]{%
  \settowidth\mylen{\KwIn{}}%
  \setlength\hangindent{\mylen}%
  \hspace*{\mylen}#1\\}


\title{Prediction of COVID-19 Lung Infiltrate Progression on Chest Radiographs Using Spatio-temporal LSTM based  Encoder-Decoder Network}

\title[Predicting COVID-19 Progression on CXR Using Spatio-temporal LSTM based Network]{Predicting COVID-19 Lung Infiltrate Progression on Chest Radiographs Using Spatio-temporal LSTM based  Encoder-Decoder Network}


 % Use \Name{Author Name} to specify the name.
 % If the surname contains spaces, enclose the surname
 % in braces, e.g. \Name{John {Smith Jones}} similarly
 % if the name has a "von" part, e.g \Name{Jane {de Winter}}.
 % If the first letter in the forenames is a diacritic
 % enclose the diacritic in braces, e.g. \Name{{\'E}louise Smith}

 % Two authors with the same address
% \midlauthor{abc abc@abc.edu \\ xyz xyz@xyz.edu \\ Department of Computing Science}



 % Three or more authors with the same address:
%   \midlauthor{\Name{Author Name1} \Email{an1@sample.edu}\\
%   \Name{Author Name2} \Email{an2@sample.edu}\\
%   \Name{Author Name3} \Email{an3@sample.edu}\\
%   \addr Address}


% Authors with different addresses:
\midlauthor
{\Name{Aishik Konwer\nametag{$^{1}$}} \Email{akonwer@cs.stonybrook.edu}\\
% \addr $^{1}$ Department of Computer Science, Stony Brook University, NY, USA
% \AND
\Name{Joseph Bae\nametag{$^{2}$}} \Email{joseph.bae@stonybrookmedicine.edu}\\
% \addr $^{2}$ Department of Biomedical Informatics, Stony Brook University, NY, USA
% \AND
\Name{Gagandeep Singh\nametag{$^{3}$}} \Email{gagandeep.singh@rwjbh.org}\\
\Name{Rishabh Gattu\nametag{$^{3}$}} \Email{rishabh.gattu@rwjbh.org}\\
\Name{Syed Ali\nametag{$^{3}$}} 
\Email{syedhali35@gmail.com}\\
\Name{Jeremy Green\nametag{$^{3}$}} \Email{jeremy.green@rwjbh.org}\\
\Name{Tej Phatak\nametag{$^{3}$}} \Email{tej.phatak@rwjbh.org}\\
% \addr $^{3}$ Department of Radiology, Newark Beth Israel Medical Center, NJ, USA
% \AND
\Name{Amit Gupta \nametag{$^{4}$}} \Email{amit.gupta@uhhospitals.org}\\
% \addr $^{4}$ University Hospitals Cleveland Medical Center, OH, USA
% \AND
\Name{Chao Chen\nametag{$^{2}$}} 
\Email{Chao.Chen.1@stonybrook.edu }\\
\Name{Joel Saltz\nametag{$^{2}$}} \Email{Joel.Saltz@stonybrookmedicine.edu}\\
\Name{Prateek Prasanna\nametag{$^{2}$}} \Email{Prateek.Prasanna@stonybrook.edu}\\
\addr $^{1}$ Department of Computer Science, Stony Brook University, NY, USA\\
\addr $^{2}$ Department of Biomedical Informatics, Stony Brook University, NY, USA\\
\addr $^{3}$ Department of Radiology, Newark Beth Israel Medical Center, NJ, USA\\
\addr $^{4}$ University Hospitals Cleveland Medical Center, OH, USA
}

%\footnotetext[1]{Contributed equally}



\begin{document}

\maketitle

\begin{abstract}

Automated analyses of chest imaging in Coronavirus Disease 2019 (COVID-19) have largely focused on a single timepoint, usually at disease presentation, and have not explicitly taken into account temporal disease manifestations. We present a deep learning-based approach for prediction of imaging progression from serial chest radiographs (CXRs) of COVID-19 patients. Our method first utilizes convolutional neural networks (CNNs) for feature extraction from patches within the concerned lung zone, and also from neighboring areas to enhance the contextual phenotypic information. The framework further incorporates two distinct spatio-temporal Long Short Term Memory (LSTM) modules for effective predictions. The first LSTM module captures spatial dependencies between patches and the second exploits the temporal context of sequential CXR scans. The resulting network focuses on critical image regions that provide relevant information for learning the progression of lung infiltrates without the explicit need for infiltrate segmentation. The second LSTM provides an encoded context vector used as an input to a decoder module to predict future severity grades. Our novel multi-institutional dataset comprises sequential CXR scans from N=100 patients. Specifically, our framework predicts zone-wise disease severity for a patient on the last day by learning representations from the previous temporal CXRs. We design two baseline approaches - one using fine-tuned VGG-16 features and the other using radiomic descriptors. Experimental results demonstrate that our proposed approach outperforms both baselines in average accuracy by 10.33\% and 12.16\%, respectively, in predicting COVID-19 progression severity.

\end{abstract}

\begin{keywords}
COVID-19, proning, convolutional neural network,  chest  radiographs, long short term memory, transfer learning
\end{keywords}


\newcommand{\myparagraph}[1]{\smallskip\noindent\textbf{#1}}

\section{Introduction}



Coronavirus disease 2019 (COVID-19) has infected 107 million people worldwide and caused over 2 million deaths as of February 2021 \cite{dong_interactive_2020}. Currently, chest radiography (CXR) is the primary imaging modality for disease monitoring \cite{noauthor_acr_nodate}. Findings of COVID-19 infection on CXR include the presence of infiltrates, opacities, and consolidations \cite{hui_clinical_2020}. These findings vary in quantity and location throughout the disease course of COVID-19. Studies have reported that the spatial distribution of these radiographic findings within lung zones is of clinical significance \cite{hui_clinical_2020,toussie_clinical_2020}. For instance, the presence of lung findings on CXR in multiple lobes has been shown to be correlated with severe disease \cite{toussie_clinical_2020}. 

\myparagraph{Clinical motivation.}
Due to their convenience, CXRs can be taken serially for inpatients with COVID-19. Expert interpretation of these serial images can be used to monitor COVID-19 progression. Recent studies have suggested that placing patients in prone position has shown to improve clinical outcomes for patients receiving mechanical ventilation in the setting of other illnesses \cite{guerin_prone_2013}. However, studies have not yet explored whether its effect on disease progression can be appraised on chest radiography (CXR).
Figure~\ref{fig:progression} shows AP (antero-posterior) radiographs of the chest (a-d) from a single patient demonstrating the lung infiltrates burden over the course of four days during prone ventilation. Lung contours have been coloured green. There exist no models currently that can predict the severity of disease, as manifested on imaging on a later time point, based on the trajectory in the first few days of treatment. As an application of our study on serial medical images, we can provide imaging evidence of disease improvement. Radiographic findings on sequential CXR might be analyzed to provide insights into when proning or other treatments should be initiated and for what duration proning is most effective in patients undergoing mechanical ventilation. 
 %Recent studies have suggested that placing patients in a prone position in the setting of CARDS during mechanical ventilation improves lung compliance, arterial oxygen partial pressure to fractional inspired oxygen ($PaO_2:FiO_2$) ratio, and lung recruitability \cite{pan_lung_2020,ziehr_respiratory_2020}. Imaging evidence of disease improvement would lend further evidence for the benefits of proning technique.
 %Additionally, the evolution of radiographic findings on sequential CXR might be analyzed to provide insight into when proning should be initiated and for what duration proning is most effective in CARDS patients undergoing mechanical ventilation.% 

% In this paper, we present a new imaging study of the proning procedure for COVID-19 patients undergoing mechanical ventilation using portable CXRs. 
  
% Aside from temporal information, we also pay special attention to spatial information. 
% \cc{The transition to the significance of the spatial information is not very visible.}

% \cc{Should we briefly describe the study here? How many patients with and without proning treatments.}

\myparagraph{Technical motivation.} Existing deep learning (DL) based COVID-19 studies primarily utilize single-timepoint radiographic images rather than serial CXRs taken at different timepoints \cite{bae_predicting_2020,shi_review_2020}. By analyzing sequential CXRs, our work aims to more accurately model disease progression.

Recurrent neural networks (RNNs) have been mainly applied for prediction of time series data in problems related to natural language processing and computer vision. RNNs have been found to be quite successful in a variety of health-care tasks such as disease progression prediction \cite{dp1,dp2} and electronic health record analysis \cite{ehealth1,ehealth2}. Previous works have explored the use of gated recurrent units to predict the evolution of tumors \cite{tumor} and treatment response from serial medical images \cite{serial1,serial2}. In RNNs, the past hidden states of an object are passed through a weighted non-linear function to predict its state at a future timepoint. As a result, relevant past information is stored and and used for future predictions. As an extension of RNN, the Long-Short Term Memory (LSTM) \cite{lstm} is specifically designed to capture long-term patterns that are commonly found over a long period of patients’ records \cite{sundermeyer}. LSTM-based approaches have achieved great success in many applications that involve sequential data, such as video processing. There are quite a few publications that employ LSTM for medical data~\cite{jiang_predicting_2018, lao_leveraging_2018, wang_toward_2019, wang_predictive_2018}. However most are based on clinical measurements  \cite{clinicalrec1}, although a few use the concept of disease progression modeling \cite{ehealth2}.

Despite the great success of LSTM, one of its major drawbacks lies in its failure to interpret prediction results. \cite{interpret} shows that capturing interpretable information is more significant than building a robust deep network in disease progression scenarios. Also, LSTMs do not directly consider irregular time intervals between consecutive events. Previous works have shown that LSTMs are able to correlate features from different image regions \cite{spatial1,spatial2}. We incorporate this idea into our framework to generate more discriminative features from the image patches. None of the previous works jointly exploit the spatial distribution within images and the temporal information across timepoints.  We use a framework that encodes a combination of spatial and temporal information. Similar models have achieved great success in action prediction from video data. \cite{video1,video2,video3}. 


Using a unique cohort of multi-institutional serial CXR (N=100 patients), we present a novel two-stage LSTM based encoder-decoder network to predict CXR severity progression. 
Sequential CXRs taken for a patient are used as inputs to this model to predict imaging severity scores for future CXRs. Our framework learns both temporal and spatial information from CXR images. 
The first stage, called \emph{LSTM-Spatial}, aggregates spatial information from different locations. This module also takes into account imaging variability in the immediately adjoining lung zones. The second stage, \emph{LSTM-Temporal}, learns to aggregate information from temporal CXRs and unravels the information to predict the severity at a future time point. 

\begin{figure}[t]

  	\begin{minipage}[b]{1.0\linewidth}
  		\centering
  		\centerline{\includegraphics[width= 6 in]{prog.png}}
  	\end{minipage}

  	\caption{AP (antero-posterior) radiographs of the chest (a-d) from a single patient demonstrating the lung infiltrates burden over the course of four days. (a) Ground glass opacities throughout the left and right lung zones on day 1. (b) Slightly increased opacities throughout the aforementioned lung zones. (c) shows that disease burden has increased to become extensive confluent consolidations in the bilateral middle and lower lung zones. (d) resembles the similar findings seen on day 1.}
  	\label{fig:progression}
  %\vspace{-.3cm}
\end{figure}

% for a single patient to analyze the progression patterns of imaging infiltrates \cite{bae_predicting_2020,shi_review_2020}. 
% Long short-term memory (LSTM) neural networks are well-suited to analyzing these sequential datasets as they use previous data entries in a time-series to forecast future results. 
% Deep learning (DL) has been applied extensively to studying radiologic images including CXRs and CTs in the setting of COVID-19, but few studies have explored serial CXRs taken at different timepoints for a single patient to analyze the progression patterns of imaging infiltrates \cite{bae_predicting_2020,shi_review_2020}. % Long short-term memory (LSTM) neural networks are well-suited to analyzing these sequential datasets as they use previous data entries in a time-series to forecast future results. 

\subsection{Contributions}

\begin{itemize}
    \item Our work uses a multi-stage spatio-temporal LSTM framework to model the progression of COVID-19 lung infiltrates over multiple timepoints and predict the infiltrate severity at a later time.
    % Our approach learns to map an input sequence of variable length into a fixed-dimensional vector representation. CXRs are present for a variable number of days for each patient, and each image contains a variable number of patches - both the sequences are handled by using LSTM.
    % \item We have used a two-layer LSTM in our proposed method - \emph{LSTM-Spatial}  captures spatial dependencies between the patches and \emph{LSTM-Temporal} exploits the temporal context between successive CXR images. The joint use of these LSTMs improves the quality of our predictions.
    \item  We are the first to use a temporal COVID-19 imaging dataset for severity prediction. Our proposed model has been evaluated on this dataset (N=100 patients, 657 CXRs) and compared against multiple baseline approaches.
    
\end{itemize}


\section{Method}

In this IRB-approved study, temporal sequences of varying number of CXR images were curated for N=100 patients. The number of images for each patient is denoted by $D$, which ranges from 4 to 13. The images corresponding to $D$ days are represented by $I_{t_1}, I_{t_2},...,I_{t_{D-1}},I_{t_D}$. The lung fields for both right ($R$) and left ($L$) lungs were automatically segmented using a Residual UNet model \cite{bae_predicting_2020}. We do not perform any image co-registration. However, to avoid any possible bias from the temporal data, these masks were further subdivided into upper ($L_{1},R_{1}$), middle ($L_{2},R_{2}$), and lower ($L_{3},R_{3}$) lung zones, with each zone comprising approximately one third of the entire lung field. Based on the observed infiltrate patterns, each of these 6 lung zones was independently assigned a severity score $g_{0}=0, g_{1}=1,$ or $g_{2}=2$ by three expert readers in consensus, representing mild, moderate, and high disease severity, respectively. The severity grades of the last image $I_{t_D}$ is used as a ground truth for the severity prediction at timepoint $t_D$.


Our model consists of 6 encoder-decoder frameworks to facilitate zone-wise predictions, represented by $F_{L_{i}},F_{R_{i}},$ where $i=1,2,3$. Figure \ref{fig:lstm} shows $F_{L_{1}}$ framework which considers patches from $L_{1}$ zone as input. 

\subsection{Two-stage encoder LSTM}

% \cc{This two-layer LSTM should be the first to present in the method section. Then we fill in other details.}
Capturing spatial dependencies for CXR imaging findings is a critical step in our analysis due to the nature of COVID-19 clinical progression. Recurrent neural networks (RNN) enable the modeling of data sequences, allowing inputs of varying number of patches. However, this method can lead to the problem of vanishing gradients during back-propagation, restricting the model's capability of handling excessively long contextual temporal information \cite{bpp}. LSTM models address this issue by proposing three gating units: input, output, and forget units \cite{lstm}. %These gates are incorporated into a block to model large long-temporal dependencies by preserving the gradient norm during back propagation. Input gate determines the amount of input information to be stored in hidden state. Output gate focuses on which hidden state information should be included in current time step output. Forget gate decides the hidden state information that should not be further remembered.%
The gates operate based on the present input and the previous hidden states.

Our framework includes two LSTM modules: 1) \emph{LSTM-Spatial} to learn the patch diversity at different spatial locations of an image, which we refer to as ``spatial dependencies" and 2) \emph{LSTM-Temporal} to exploit the ``temporal dependencies" between CXR images from multiple days.

The feature representations of all image patches obtained from CNN described below (sub-section~\ref{subsec_CNN}), are fed into \emph{LSTM-Spatial} following the same sequence in which the extracted patches were provided to the CNN. The number of timesteps in \emph{LSTM-Spatial} depends on the number of CNN maps obtained for each zone for each day. Time steps vary from $1$ to $P$ for each day, where $P$ refers to the number of patches obtained from a given zone. The output from each timestep is a $1\times512$ dimension feature vector. Thus, as the output of the \emph{LSTM-Spatial}, we obtained a $P\times512$ dimension feature vector, where $P$ varies day-wise and, further, patient-wise. It can be seen from Figure \ref{fig:lstm} that day `$t_1$' of the particular patient has $P$ input patches which may differ across days $[{t_2},...,{t_{D-1}}]$. To construct a holistic feature representation from each of the $D-1$ days of a particular patient, we obtain a single dimensional feature vector from the last cell state (denoted by \emph{LCS} in Figure \ref{fig:lstm}) of the \emph{LSTM-Spatial}. This is the global feature of the entire sequence of patches for a particular day, and has dimension $1\times512$. We provide these day-wise global features as inputs to each timestep of \emph{LSTM-Temporal}. \emph{LSTM-Temporal} has $D-1$ timesteps. Finally, from the last cell state of the \emph{LSTM-Temporal}, we obtain the encoder module context vector.


\begin{figure*}[t]

  	\begin{minipage}[b]{1.0\linewidth}
  		\centering
  		\centerline{\includegraphics[width= 5.2 in]{Final_LSTM.png}}
  	\end{minipage}

  	\caption{Architecture of the proposed LSTM approach
  }
  	\label{fig:lstm}
  	%\vspace{-.3cm}
\end{figure*}

\subsection{Patch extraction}

A sliding window approach was employed to extract dense patches from each lung zone. The patch dimension used for our framework is $256\times256$. The stride length of the window is chosen as 128 pixels. Because segmented masks from CXRs have non-uniform dimensions, the number of patches extracted for a lung zone will vary each day for a particular patient. The patches that had more than 80\% background pixels were discarded. In our dataset, a large proportion of image zones are assigned $g_{1}$,  making it the majority class. To address the issue of class imbalance, we randomly upsampled the number of patches labelled ($g_{0},g_{2}$) by 25\%. Thus, for an encoder-decoder framework for a particular zone, we achieved a fair proportion of patches with grades $g_{0},g_{1}$, and $g_{2}$. Moreover, it has been clinically proven that features from neighboring areas tend to enhance the contextual information of a zone in medical imaging \cite{toussie_clinical_2020}. Therefore, in the pool of extracted patches to be used as an input for a zone, we also included an array of patches from the boundary of its adjoining neighboring zones. As an example, for patches from $L_{2}$ shown in Figure \ref{fig:lstm}, the closest patch array from adjoining zones $L_{1}$ and $L_{3}$ have been considered.

\subsection{CNN for feature extraction}
\label{subsec_CNN}
After patch extraction, for each of the 6 zone-wise encoder-decoder frameworks, a CNN architecture was employed to obtain image feature representations. For each framework, patches from the concerned lung zone were fed as input to this CNN network in a day-wise manner, spanning across ${D-1}$ days. The number of days varies for each patient depending on the number of time-points available in the dataset. For example, for the patient in Figure \ref{fig:progression}, $D=4$. The output response of the CNN for each patch for each such time-point is a $1\times256$ dimensional feature vector. The CNN network configuration contains five convolutional layers, each associated with an operation of max-pooling. The model terminates with a fully connected layer. 

\subsection{The decoder module}
A decoder module is defined to decode the encoded vector representation from \emph{LSTM-Temporal} and predict the grades of ${t_{D}}^{th}$ day, for each lung zone. The encoded context vector  and start tokenizer ($EOS$) are used as inputs \cite{Sequence} to the first timestep of the LSTM module in this section. We then apply a softmax layer to classify the decoder output into $g_{0},g_{1}$, or $g_{2}$. 


\section{Experiments}

\subsection{Dataset Description}
The multi-institutional dataset consists of a unique cohort of AP CXRs from 23 COVID-19 patients at Newark Beth Israel Medical Center~\cite{cowan2021evolution}, and from 77 COVID-19 patients at Stony Brook University Hospital (657 scans in total). CXRs were 3470$\times$4234 pixels in size. The duration (number of days) between the CXRs are variable. For a particular patient, there can be a span of 1 day or even 5 days between two sequential timepoints.

\subsection{Implementation Details}
A cross entropy loss function was selected for training, which was optimized with an Adam optimizer for both the CNN and the LSTM. The initial and consistent learning rate and maximal number of epochs were set to 0.0001 and 15, respectively. We used pack padded sequence using Pytorch to mask out all losses that surpassed the required sequence length. Thus we could nullify the effect of missing timesteps for a patient in the dataset. For each model, we performed a 5-fold cross validation, with 20 distinct test cases (patients) in each fold. Each time, the 80 remaining cases were randomly divided into 60 training and 20 validation splits.

\subsection{Baseline approach}
%\cc{It makes more sense to move this section to the third section (titled baseline). And the term "transfer learning" made the baseline sounds too good.}
\myparagraph{Approach 1.}
We trained 6 models in this baseline approach for each of the 6 lung zones. The last layer of the VGG-16 network \cite{vgg16} was replaced with a mini network of 2 small fully connected layers. The new network was trained after freezing all other pre-trained convolutional weights as shown in Figure \ref{fig:baseline} of Appendix section. 

For our framework, we considered the first $D-1$ days' images of a patient, that are $I_{t_1}$, $I_{t_2}$,..., $I_{t_{D-1}}$. Hence for each patient, $64\times64$ patches were extracted in a sliding window approach from the concerned zone of each $I$ with a stride of 32. Upsampling of minority label patches and inclusion of neighboring patches were adopted, similar to our LSTM approach. Features were extracted from these patches to obtain a $P_{L}\times4096$ dimensional feature vector. $P_L$ denotes the number of patches for a particular zone over $D-1$ days, which may vary across different patients. The output for each patient was averaged into a $1\times4096$ feature vector.

We trained a 1D neural network with these extracted feature vectors to perform the final classification task. In the testing phase for each patient, the classifier predicted severity scores for each patch. A majority voting approach was then employed on these patch classification scores to obtain a single zone-wise severity score. This score was compared against the ground truth severity grade for ${I_{t_D}}^{th}$ image to compute the evaluation
metrics.

An SGD optimizer with a batch-size of 64 was applied to minimize  the objectives. The VGG-16 network, with a learning rate set to 0.0001, was finetuned across 30 epochs. Categorical cross entropy loss was used as a cost function. 

\myparagraph{Approach 2.}
We trained 6 different models of this baseline for each of the 6 lung zones. In this approach, 445 texture-based radiomic features~\cite{prasanna2017radiomic, thawani2018radiomics} were extracted from the patches of the concerned zone of a patient~\cite{pyrad}. Averaging was performed on this feature vector and passed on to a random forest classifier. An approach identical to the previous baseline was used to compute the classification performance.

% \begin{figure}[t]

%   	\begin{minipage}[b]{1.0\linewidth}
%   		\centering
%   		\centerline{\includegraphics[width= 4 in]{baseline_new.png}}
%   	\end{minipage}

%   	\caption{Architecture of the baseline approach
%   }
%   	\label{fig:baseline}
%   	%\vspace{-.3cm}
% \end{figure}


\section{Results}

Experimental  results  were  quantitatively evaluated using accuracy ($Acc$), precision ($Pre$), and recall ($Rec$) metrics. $Acc$ was measured zone-wise, whereas $Pre$ and $Rec$ were calculated on a grade level. The results using our approach and the baseline method are illustrated in Tables \ref{Left results} and \ref{Right results}  for the left and the right lung zones, respectively (standard deviations reported in Appendix D). Notably, the proposed two-stage LSTM network consistently outperforms the designed baseline models. This is likely because our network is able to exploit spatial and temporal dependencies in CXR images; on the other hand, both baseline methods average convolutional and radiomic feature vectors respectively from $I_{t_1}$, $I_{t_2}$,..., $I_{t_{D-1}}$. 
% something that our proposed network is able to exploit. 
We also provide breakdowns of different sub-variants of our configuration in an ablation setup and tested them to analyze the gradual improvement.
% \begin{itemize}

    \myparagraph {Variant-1:} This variant used a single stage LSTM in which \emph{LSTM-Spatial} was removed from our framework. A simple averaging of the CNN feature maps to construct the feature vector input for each time-point of \emph{LSTM-Temporal} was used instead.
    
    \myparagraph {Variant-2:} This variant of our configuration was designed without upsampling the minority label patches and not giving importance to edge patches from the neighboring zones.
    
% \end{itemize}
The combined two-stage LSTM was incrementally developed from these more fundamental approaches, and was shown to outperform each by a significant margin. For example, in the middle zone of the left lung, the accuracy improved from 69\% (in variant $1$) and 70\% (in variant $2$) to 73\% achieved through our proposed method. The improved performance of our model as compared to variant $2$ also seems to suggest that contextual information from immediate adjoining lung zones plays an important role in the disease trajectory.

We used Cohen's Kappa score ($\mathcal{K}$) to evaluate the agreement between predictions of each approach and the grades assigned by experts. $\mathcal{K}$ values were computed to be 0.503, 0.41, 0.426, 0.541, 0.362, 0.43 for $L_{1}$, $L_{2}$, $L_{3}$, $R_{1}$, $R_{2}$, and $R_{3}$ zones, respectively. We noticed that our model prediction has a higher agreement with the radiologists in the upper lung zones.
% We noticed the relative contribution of 6 zones in our spatial modelling approach. The upper zones of both lungs hold much more importance in our model. 
Also, the average $\mathcal{K}$ values for our approach, baseline $1$, baseline $2$, variant $1$, and variant $2$ were 0.445, 0.256, 0.219, 0.356, and 0.32 respectively. $\mathcal{K}$ values for other methods were significantly lower than our approach.
%\vspace{-3mm}

\begin{table*}[!t]
\caption{Quantitative results (Accuracy, Precision, Recall) shown for Left lung zones (Upper, Middle, Lower)}
\label{Left results}
\resizebox{\textwidth}{!}{
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
\multirow{2}{*}{\textbf{Methods}} & \multicolumn{7}{c|}{\textbf{Left Lung Upper}} & \multicolumn{7}{c|}{\textbf{Left Lung Middle}} & \multicolumn{7}{c|}{\textbf{Left Lung Lower}} \\ \cline{2-22} 
 & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} \\ \hline
Baseline-1 & 60 \pm 4.76 & 0.55 & 0.66 & 0.52 & 0.51 & 0.68 & 0.52 & 64 \pm 5.47  & 0.5 & 0.71 & 0.59 & 0.47 & 0.74 & 0.56 & 58 \pm 4.89 & 0.5 & 0.62 & 0.56 & 0.47 & 0.71 & 0.48 \\ \hline
Baseline-2 & 57 \pm 4.63 & 0.48 & 0.67 & 0.48 & 0.41 & 0.64 & 0.60 & 61 \pm 5.80 & 0.48 & 0.7 & 0.56 & 0.52 & 0.64 & 0.60 & 55 \pm 4.64 & 0.5 & 0.60 & 0.50 & 0.52 & 0.70 & 0.43 \\ \hline
\begin{tabular}[c]{@{}c@{}}Variant-1 \end{tabular} & 66 \pm 4.33 & 0.57 & 0.67 & 0.63 & 0.56 & 0.73 & 0.48 & 69 \pm 4.68 & 0.45 & 0.72 & 0.51 & 0.65 & 0.74 & 0.56 & 64 \pm 4.87 & 0.39 & 0.66 & \textbf{0.74} & 0.45 & 0.74 & 0.70\\ \hline
Variant-2 & 68 \pm 3.51 & 0.53 & 0.74 & 0.41 & \textbf{0.67} & \textbf{0.78} & 0.63 & 70 \pm 2.89 & 0.43 & 0.68 & 0.55 & 0.47 & 0.69 & \textbf{0.77} & 61 \pm 4.16 & 0.35 & 0.57 & 0.64 & 0.48 & \textbf{0.81} & 0.64 \\ \hline
Our Approach & \textbf{71} \pm 3.58 & \textbf{0.69} & \textbf{0.75} & \textbf{0.64} & 0.62 & 0.77 & \textbf{0.69} & \textbf{73} \pm 2.56 & \textbf{0.72} & \textbf{0.77} & \textbf{0.6} & \textbf{0.69} & \textbf{0.83} & 0.52 & \textbf{69} \pm 3.94 & \textbf{0.61} & \textbf{0.73} & 0.67 & 0.52 & 0.73 & \textbf{0.72}  \\ \hline
\end{tabular}}
\end{table*}

\begin{table*}[!t]
\caption{Quantitative results (Accuracy, Precision, Recall) shown for Right lung zones (Upper, Middle, Lower)}
\label{Right results}
\resizebox{\textwidth}{!}{\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
\multirow{2}{*}{\textbf{Methods}} & \multicolumn{7}{c|}{\textbf{Right Lung Upper}} & \multicolumn{7}{c|}{\textbf{Right Lung Middle}} & \multicolumn{7}{c|}{\textbf{Right Lung Lower}} \\ \cline{2-22} 
 & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} \\ \hline
Baseline-1 & 64 \pm 3.23 & 0.56 & 0.72 & 0.52 & 0.6 & 0.71 & 0.5 & 55 \pm 4.08 & 0.45 & 0.60 & 0.51 & 0.40 & 0.63 & 0.51 & 58 \pm 3.91 & 0.54 & 0.62 & 0.54 & 0.42 & 0.59 & 0.61 \\ \hline
Baseline-2 & 67 \pm 3.38 & 0.63 & 0.72 & 0.6 & 0.7 & 0.65 & 0.66 & 52 \pm 3.29 & 0.47 & 0.60 & 0.39 & 0.45 & 0.63 & 0.37 & 56 \pm 4.36 & 0.4 & 0.65 & 0.51 & 0.42 & 0.63 & 0.51 \\ \hline
\begin{tabular}[c]{@{}c@{}}Variant-1 \end{tabular} & 72 \pm 3.09 & 0.65 & 0.69 & 0.52 & 0.67 & 0.73 & 0.62& 66 \pm 2.62 & 0.42 & 0.56 & \textbf{0.72} & 0.56 & 0.62 & \textbf{0.80} & 63 \pm 3.85 & 0.50 & 0.62 & 0.57 & \textbf{0.60} & 0.61 & 0.52\\ \hline
Variant-2 & 70 \pm 2.81 & \textbf{0.84} & 0.62 & \textbf{0.67} & 0.70 & 0.64 & 0.57 & 62 \pm 1.76 & 0.59 & \textbf{0.78} & 0.63 & 0.57 & 0.71 & 0.45 & 64 \pm 3.27 & \textbf{0.77} & 0.43 & 0.66 & 0.40 & \textbf{0.66} & 0.66 \\ \hline
Our Approach & \textbf{76} \pm 2.33 & 0.68 & \textbf{0.84} & 0.66 & \textbf{0.73} & \textbf{0.80} & \textbf{0.66} & \textbf{67} \pm 2.72 & \textbf{0.59} & 0.71 & 0.65 & \textbf{0.59} & \textbf{0.75} & 0.58 & \textbf{65} \pm 3.73 & 0.5 & \textbf{0.67} & \textbf{0.67} & 0.5 & 0.65 & \textbf{0.69} \\ \hline
\end{tabular}}
\end{table*}


\section{Conclusion}

Imaging changes post onset of COVID-19 have been studied previously, albeit mostly in CT scans~\cite{liang2020evolution}. Study of imaging evolution using machine learning techniques can complement the understanding of COVID-19 pathogenesis. Portable CXR is a more widely available modality and is an ideal tool to monitor imaging progression~\cite{khullar2020effects}. Here we present a novel multi-stage LSTM framework for the analysis of serial CXRs to predict changes in imaging severity. Unlike datasets used in other studies~\cite{duchesne2020tracking}, we developed and validated our models on a very unique dataset of sequential CXRs collected over multiple days from two institutions. Unlike generative approaches, our model does not require registration between images from different timepoints. More importantly, our computational approach mirrors the clinical diagnostic interpretation process for medical images by uniquely taking advantage of the temporal evolution and spatial context of COVID-19 manifestation on CXRs. This enables more accurate predictions of the future evolution of the disease as compared to simpler computational models. By predicting future CXR severity scores in COVID-19 patients, our model might enable physicians to modulate the duration and timing of treatments (such as prone ventilation) in order to improve clinical outcomes. Furthermore, the proposed multi-stage LSTM approach can be applied to monitor progression in other diseases in which multiple sequential images are acquired. 
% Acknowledgments---Will not appear in anonymized version
\section*{Acknowledgments}
Reported research was supported by the OVPR and IEDM seed grants, 2019 at Stony Brook University, NIGMS T32GM008444, NSF IIS-1909038, NSF CCF-1855760, and enabled by the Renaissance SOM at SBU’s “COVID-19 Data Commons and Analytic Environment”, a data quality initiative instituted by the Office of the Dean, and supported by BMI department. The authors have no relevant financial or non-financial interests to disclose.


%\bibliographystyle{plain}
\bibliography{konwer21}
\newpage
\appendix

\section{Additional experiments}
A new baseline (Baseline-3) is formulated as follows:
For each of the 100 cases, we have CXRs from at least 4 timepoints. We aim to extract features from the images of the first three timepoints and predict the severity grade on the fourth image. We average the features from the patches at a single timepoint into a 1 $\times$ 4096 feature vector. 3 such feature vectors are extracted from each of 3 timepoints and concatenated into a 3 $\times$ 4096 feature vector. Hence, unlike Baseline - 1, we do not perform averaging across all timepoints but at each timepoint on an individual basis. In this way, we can capture the inherent features relevant to each timepoint within our encoded representation at greater capacity. The 3 $\times$ 4096 feature vector is eventually flattened and provided to 1D-NN classifier for the severity grade prediction.

A new variant (Variant-3) is designed. We have now averaged the feature vectors obtained from the last cell state of spatial LSTM at each timepoint. We provide this averaged feature representation as the context vector to our decoder module. Quantitative results for methods Variant-3 and Baseline-3 are presented in the following tables:

\begin{table*}[!ht]
\caption{Quantitative Results on Left lung zones}
\label{Left results on B3V3}
\resizebox{\textwidth}{!}{
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
\multirow{2}{*}{\textbf{Methods}} & \multicolumn{7}{c|}{\textbf{Left Lung Upper}} & \multicolumn{7}{c|}{\textbf{Left Lung Middle}} & \multicolumn{7}{c|}{\textbf{Left Lung Lower}} \\ \cline{2-22} 
 & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} \\ \hline
Baseline-3 & 54 & 0.56 & 0.71 & 0.45 & 0.57 & 0.62 & 0.48 & 63 & 0.42 & 0.77 & 0.58 & 0.54 & 0.71 & 0.63 & 51 & 0.56 & 0.66 & 0.40 & 0.42 & 0.63 & 0.56 \\ \hline
Variant-3 & 65 & 0.47 & 0.80 & 0.45 & 0.69 & 0.73 & 0.66 & 69 & 0.52 & 0.64 & 0.63 & 0.49 & 0.72 & 0.74 & 63  & 0.56 & 0.52 & 0.67 & 0.41 & 0.79 & 0.60 \\ \hline
\end{tabular}}
\end{table*}

\begin{table*}[!ht]
\caption{Quantitative Results on Right lung zones}
\label{Right results on B3V3}
\resizebox{\textwidth}{!}{\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
\multirow{2}{*}{\textbf{Methods}} & \multicolumn{7}{c|}{\textbf{Right Lung Upper}} & \multicolumn{7}{c|}{\textbf{Right Lung Middle}} & \multicolumn{7}{c|}{\textbf{Right Lung Lower}} \\ \cline{2-22} 
 & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} & \textbf{$Acc (\%)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Pre$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$Rec$\\ 0    1    2\end{tabular}}} \\ \hline
Baseline-3 & 60 & 0.49 & 0.76 & 0.68 & 0.63 & 0.64 & 0.57 & 56 & 0.51 & 0.64 & 0.48 & 0.52 & 0.60 & 0.61 & 60 & 0.58 & 0.64 & 0.51 & 0.47 & 0.65 & 0.63 \\ \hline
Variant-3 & 69 & 0.72 & 0.69 & 0.63 & 0.66 & 0.82 & 0.63 & 64 & 0.63 & 0.75 & 0.69 & 0.61 & 0.75 & 0.53 & 59 & 0.61 & 0.63 & 0.58 & 0.59 & 0.62 & 0.75 \\ \hline
\end{tabular}}
\end{table*}
\section{Plot of severity grade distribution across timepoints}

%\centering
\begin{figure}[h]

  	\begin{minipage}[b]{1.0\linewidth}
  		\centering
  		\centerline{\includegraphics[width= 2.5 in]{gradedist.png}}
  	\end{minipage}

  	\caption{Distribution of severity grades (0,1,2) across 13 timepoints
  }
  	\label{fig:dist}
  %	\vspace{-.3cm}
\end{figure}

\section{Network configurations}

While designing both \emph{LSTM-Spatial} and \emph{LSTM-Temporal}, we stacked two LSTM layers for better abstraction ability.

\begin{table}[!h]
\tiny
\centering
\caption{CNN configuration}
\label{CNN config}
\begin{tabular}{|c|c|}
\hline
\textbf{Type} & \textbf{Configuration} \\ \hline
Input & 256 $\times$ 256 patches \\ \hline
\begin{tabular}[c]{@{}c@{}}Convolution\\ Max pooling\end{tabular} & \begin{tabular}[c]{@{}c@{}}filter:8, kernel:5$\times$ 5, auto-padding\\ kernel:3, stride:2, auto-padding\\ Output size: 8$\times$128$\times$128\end{tabular} \\ \hline
\begin{tabular}[c]{@{}c@{}}Convolution\\ Max pooling\\  (2$\times$)\end{tabular} & \begin{tabular}[c]{@{}c@{}}filter:16, kernel:3$\times$3, auto-padding\\ kernel:3, stride: 2, auto-padding\\ Output size: 16$\times$32$\times$32\end{tabular} \\ \hline
\begin{tabular}[c]{@{}c@{}}Convolution\\ Max pooling\\  (2$\times$)\end{tabular} & \begin{tabular}[c]{@{}c@{}}filter:32, kernel:3$\times$3, auto-padding\\ kernel:3, stride: 2, auto-padding\\ Output size: 32$\times$8$\times$8\end{tabular} \\ \hline
Fully connected & 256 neurons \\ \hline
\end{tabular}
\end{table}

\section{Standard deviations}

\begin{table*}[!h]
\caption{Standard deviation of results (Accuracy, Precision, Recall) shown for Left lung zones (Upper, Middle, Lower)}
\label{Left std}
\resizebox{\textwidth}{!}{
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
\multirow{2}{*}{\textbf{Methods}} & \multicolumn{7}{c|}{\textbf{Left Lung Upper}} & \multicolumn{7}{c|}{\textbf{Left Lung Middle}} & \multicolumn{7}{c|}{\textbf{Left Lung Lower}} \\ \cline{2-22} 
 & \textbf{$std(Acc)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Pre)$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Rec)$\\ 0    1    2\end{tabular}}} & \textbf{$std(Acc)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Pre)$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Rec)$\\ 0    1    2\end{tabular}}} & \textbf{$std(Acc)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Pre)$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Rec)$\\ 0    1    2\end{tabular}}} \\ \hline
Baseline-1 & 4.76 & 0.032 & 0.024 & 0.022 & 0.025 & 0.023 & 0.019 & 5.47 & 0.017 & 0.029 & 0.031 & 0.021 & 0.027 & 0.03 & 4.89 & 0.024 & 0.022 & 0.031 & 0.035 & 0.017 & 0.024 \\ \hline
Baseline-2 & 4.63 & 0.023 & 0.025 & 0.029 & 0.016 & 0.025 & 0.031 & 5.80 & 0.022 & 0.017 & 0.026 & 0.03 & 0.014 & 0.023 & 4.64 & 0.034 & 0.02 & 0.017 & 0.013 & 0.024 & 0.029 \\ \hline
%Baseline-3 & 4.41 & 5.13 & 5.62  \\ \hline
\begin{tabular}[c]{@{}c@{}}Variant-1 \end{tabular} & 4.33 & 0.03 & 0.027 & 0.022 & 0.028 & 0.032 & 0.02 & 4.68 & 0.021 & 0.016 & 0.035 & 0.031 & 0.023 & 0.028 & 4.87 & 0.018 & 0.016 & 0.023 & 0.02 & 0.034 & 0.026\\ \hline
Variant-2 & 3.51 &  0.018 & 0.021 & 0.017 & 0.015 & 0.024 & 0.027 & 2.89 & 0.024 & 0.016 & 0.021 & 0.018 & 0.015 & 0.027 & 4.16 & 0.019 & 0.025 & 0.032 & 0.023 & 0.02 & 0.023 \\ \hline
%Variant-3 & 3.79 & 3.21 & 3.28  \\ \hline
Our Approach & 3.58 & 0.015 & 0.013 & 0.02 & 0.017 & 0.019 & 0.022 & 2.56 & 0.019 & 0.021 & 0.024 & 0.012 & 0.014 & 0.018 & 3.94 & 0.012 & 0.016 & 0.026 & 0.021 & 0.018 & 0.025\\ \hline
\end{tabular}}
\end{table*}

\begin{table*}[!h]
\caption{Standard deviation of results (Accuracy, Precision, Recall) shown for Right lung zones (Upper, Middle, Lower)}
\label{Right std}
\resizebox{\textwidth}{!}{\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
\multirow{2}{*}{\textbf{Methods}} & \multicolumn{7}{c|}{\textbf{Right Lung Upper}} & \multicolumn{7}{c|}{\textbf{Right Lung Middle}} & \multicolumn{7}{c|}{\textbf{Right Lung Lower}} \\ \cline{2-22} 
 & \textbf{$std(Acc)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Pre)$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Rec)$\\ 0    1    2\end{tabular}}} & \textbf{$std(Acc)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Pre)$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Rec)$\\ 0    1    2\end{tabular}}} & \textbf{$std(Acc)$} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Pre)$\\ 0    1    2\end{tabular}}} & \multicolumn{3}{c|}{\textbf{\begin{tabular}[c]{@{}c@{}}$std(Rec)$\\ 0    1    2\end{tabular}}} \\ \hline
Baseline-1 & 3.23 & 0.034 & 0.03 & 0.037 & 0.025 & 0.021 & 0.032 & 4.08 & 0.032 & 0.036 & 0.021 & 0.028 & 0.029 & 0.035 & 3.91 & 0.019 & 0.027 & 0.03 & 0.023 & 0.028 & 0.034 \\ \hline
Baseline-2 & 3.38 & 0.023 & 0.027 & 0.02 & 0.03 & 0.028 & 0.035 & 3.29 & 0.021 & 0.024 & 0.028 & 0.023 & 0.03 & 0.037 & 4.36 & 0.026 & 0.031 & 0.036 & 0.031 & 0.026 & 0.029 \\ \hline
%Baseline-3 & 4.14 & 3.87 & 4.25  \\ \hline
\begin{tabular}[c]{@{}c@{}}Variant-1 \end{tabular} & 3.09 & 0.018 & 0.022 & 0.027 & 0.02 & 0.025 & 0.032 & 2.62 & 0.02 & 0.028 & 0.033 & 0.024 & 0.018 & 0.027 & 3.85 & 0.024 & 0.018 & 0.029 & 0.022 & 0.015 & 0.026 \\ \hline
Variant-2 & 2.81 & 0.02 & 0.018 & 0.023 & 0.015 & 0.023 & 0.026 & 1.76 & 0.017 & 0.026 & 0.031 & 0.022 & 0.025 & 0.021 & 3.27 & 0.018 & 0.014 & 0.021 & 0.019 & 0.016 & 0.024 \\ \hline
%Variant-3 & 3.34 & 2.21 & 3.58 \\ \hline
Our Approach & 2.33 & 0.013 & 0.018 & 0.021 & 0.016 & 0.011 & 0.024 & 2.72 & 0.012 & 0.021 & 0.026 & 0.018 & 0.013 & 0.019 & 3.73 & 0.02 & 0.014 & 0.026 & 0.013 & 0.009 & 0.021 \\ \hline
\end{tabular}}
\end{table*}


\section{Dataset details}
X-rays of 23 patients have been obtained from Newark Beth Israel Medical center. The remaining 77 case X-rays have been obtained from Stony Brook University Hospital. CXRs taken from Stony Brook University Hospital were acquired using the portable DRX Revolution machine developed by Carestream Health with AP image technique. Image acquisition parameters included average kVp of 90 and average mA of 2.8. CXRs taken from Newark Beth Israel Medical Center were acquired using GE Optima XR240 AMX portable machines. Image acquisition parameters included kVp 85 and mAs between 4 to 5 with automatic exposure control.
%\vspace{-2in}

\section{Baseline Architecture}
\begin{figure}[h]

  	\begin{minipage}[b]{1.0\linewidth}
  		\centering
  		\centerline{\includegraphics[width= 4 in]{baseline_new.png}}
  	\end{minipage}

  	\caption{Architecture of the baseline approach
  }
  	\label{fig:baseline}
  	%\vspace{-.3cm}
\end{figure}

\end{document}
