
Several publicly available CXR datasets can be used for image classification task, for example, CheXpert~\cite{irvin2019chexpert}, which contains 224,316 chest radiographs collect from 65,240 patients in Stanford Hospital.
%
Overall, these CXR datasets used information in doctor's notes, extracted by Natural Language Processing (NLP), as a ground-truth for training and validating any proposed Machine Learning (ML) models. 
%
However, this technique has a limitation in dealing with multi-language ambiguity and uncertainties in radiology reports. 
%
Furthermore, most of the annotations are not validated by radiologists or professional physicians to ensure the annotations' quality. 
%
% Moreover, the label in a radiologist's note may not be entirely accurate due to the biases mentioned above. 
Therefore, it leads to the decreasing of confidences in labels extracted from radiologist's notes.  
%
Majkowska et al.~\cite{Majkowska2020} proposed a procedure to obtain qualified labels.
%
However, this method only produces high-quality labels but also consumes a lot of time and cost; hence, it is only well-suited for making the high-quality test and validation set.

%
Active Learning (AL) is a promising method to solve limited, highly qualified labeled data in the medical domain. 
%
AL mainly lies in evaluating the informativeness of data points. 
%
The main families of informativeness measurement in AL are uncertainty, Cost-Effective Active Learning (CEAL)~\cite{ceal}, and representation, Suggestive Annotation ~\cite{SugAno}.


In this work, we study the effect of the AL methods in the regime of a large and small amount of available unlabeled data.
%
We present a novel AL method, called Gist Set Online Activate Learning (GOAL), for efficient annotations. 
%
Our approach further saves annotation costs by reducing the amount of data that needs to be additionally labeled by doctors while keeping the same performance as using full data.
%
Our method shares a similar flow with CEAL but is different from it in two aspects. 
%
Firstly, uncertainty and representation are combined for sample selection, which we call the Gist-set Selection. 
%
Secondly, the pseudo-labels are updated using momentum after each iteration, which we call Online Active Learning. 
%
We evaluated our method based on both our private and public datasets. 
%
The private dataset consists of two findings, \textbf{68,959 positive instances} of Airspace Opacity (AO) and \textbf{12,848 positive instances} of Lung Lesion (LL) out of \textbf{131,030 annotated instances}.
%
For the public domain, we use Pneumonia (PN) data from RSNA Pneumonia dataset\footnote{https://www.kaggle.com/c/rsna-pneumonia-detection challenge}, which contains \textbf{9,555 positive instances} out of \textbf{26,684 instances}, and Pleural Effusion (PE) from CheXpert~\cite{irvin2019chexpert}, which contains \textbf{86,477 positive instances} out of \textbf{191,027 frontal instances}.