Keywords: Siamese Network, Semi-Supervised, Noisy Data, Bad Data, Contrastive Loss.
Abstract: Noisy data present in medical imaging datasets can often aid the development of robust models that are equipped to handle real-world data. However, if the bad data contains insufficient anatomical information, it can have a severe negative effect on the model's performance. We propose a novel methodology using a semi-supervised Siamese network to identify bad data. This method requires only a small pool of 'reference' medical images to be reviewed by a non-expert human to ensure the major anatomical structures are present in the Field of View. The model trains on this reference set and identifies bad data by using the Siamese network to compute the distance between the reference set and all other medical images in the dataset. This methodology achieves an Area Under the Curve (AUC) of 0.989 for identifying bad data. Code will be available at https://git.io/JYFuV.
Paper Type: methodological development
Primary Subject Area: Learning with Noisy Labels and Limited Data
Secondary Subject Area: Application: Radiology
Paper Status: original work, not submitted yet
Source Code Url: Code will be made publicly available at https://git.io/JYFuV.
Data Set Url: The Data Set is available at the following URL https://stanfordmlgroup.github.io/competitions/mrnet/
Registration: I acknowledge that publication of this at MIDL and in the proceedings requires at least one of the authors to register and present the work during the conference.
Authorship: I confirm that I am the author of this work and that it has not been submitted to another publication before.
Source Latex: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2108.07130/code)