Semi-Supervised Siamese Network for Identifying Bad Data in Medical Imaging DatasetsDownload PDF

06 Apr 2021, 21:25 (edited 01 Jun 2021)MIDL 2021 Conference Short SubmissionReaders: Everyone
  • Keywords: Siamese Network, Semi-Supervised, Noisy Data, Bad Data, Contrastive Loss.
  • Abstract: Noisy data present in medical imaging datasets can often aid the development of robust models that are equipped to handle real-world data. However, if the bad data contains insufficient anatomical information, it can have a severe negative effect on the model's performance. We propose a novel methodology using a semi-supervised Siamese network to identify bad data. This method requires only a small pool of 'reference' medical images to be reviewed by a non-expert human to ensure the major anatomical structures are present in the Field of View. The model trains on this reference set and identifies bad data by using the Siamese network to compute the distance between the reference set and all other medical images in the dataset. This methodology achieves an Area Under the Curve (AUC) of 0.989 for identifying bad data. Code will be available at
  • Paper Type: methodological development
  • Primary Subject Area: Learning with Noisy Labels and Limited Data
  • Secondary Subject Area: Application: Radiology
  • Paper Status: original work, not submitted yet
  • Source Code Url: Code will be made publicly available at
  • Data Set Url: The Data Set is available at the following URL
  • Registration: I acknowledge that publication of this at MIDL and in the proceedings requires at least one of the authors to register and present the work during the conference.
  • Authorship: I confirm that I am the author of this work and that it has not been submitted to another publication before.
  • Source Latex: zip
4 Replies