Common limitations of performance metrics in biomedical image analysis

Annika Reinke; Matthias Eisenmann; Minu Dietlinde Tizabi; Carole H. Sudre; Tim Rädsch; Michela Antonelli; Tal Arbel; Spyridon Bakas; M. Jorge Cardoso; Veronika Cheplygina; Keyvan Farahani; Ben Glocker; Doreen Heckmann-Nötzel; Fabian Isensee; Pierre Jannin; Charles Kahn; Jens Kleesiek; Tahsin Kurc; Michal Kozubek; Bennett A. Landman; Geert Litjens; Klaus Maier-Hein; Anne Martel; bjoern menze; Henning Müller; Jens Petersen; Mauricio Reyes; Nicola Rieke; Bram Stieltjes; Ronald M. Summers; Sotirios A. Tsaftaris; Bram van Ginneken; Annette Kopp-Schneider; Paul Jäger; Lena Maier-Hein

Common limitations of performance metrics in biomedical image analysis

Published: 11 May 2021, Last Modified: 06 Mar 2025MIDL 2021 PosterReaders: Everyone

Keywords: Segmentation, Validation, Metrics, Challenges, Good Scientific Practice

Abstract: While the importance of automatic biomedical image analysis is increasing at an enormous pace, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are key for objective, transparent and comparative performance assessment, but little attention has been given to their pitfalls. Under the umbrella of the Helmholtz Imaging Platform (HIP), three international initiatives - the MICCAI Society's challenge working group, the Biomedical Image Analysis Challenges (BIAS) initiative, as well as the benchmarking working group of the MONAI framework - have now joined forces with the mission to generate best practice recommendations with respect to metrics in medical image analysis. Consensus building is achieved via a Delphi process, a popular tool for integrating opinions in large international consortia. The current document serves as a teaser for the results presentation and focuses on the pitfalls of the most commonly used metric in biomedical image analysis, the Dice Similarity Coefficient (DSC), in the categories of (1) mathematical properties/edge cases, (2) task/metric fit and (3) metric aggregation. Being compiled by a large group of experts from more than 30 institutes worldwide, we believe that our framework could be of general interest to the MIDL community and will improve the quality of biomedical image analysis algorithm validation.

Paper Type: validation/application paper

Primary Subject Area: Segmentation

Secondary Subject Area: Application: Other

Paper Status: original work, not submitted yet

Source Code Url: This short paper raises awareness about some common flaws of the most frequently used segmentation metric in the biomedical image analysis community in a graphical manner, based on fictitious examples. Therefore, no source code was produced.

Data Set Url: This short paper raises awareness about some common flaws of the most frequently used segmentation metric in the biomedical image analysis community in a graphical manner, based on fictitious examples. Therefore, no further datasets were used.

Registration: I acknowledge that publication of this at MIDL and in the proceedings requires at least one of the authors to register and present the work during the conference.

Authorship: I confirm that I am the author of this work and that it has not been submitted to another publication before.

Source Latex: zip

4 Replies

Loading