Bias Assessment and Data Drift Detection in Medical Image Analysis: A Survey

TMLR Paper5810 Authors

04 Sept 2025 (modified: 22 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Machine learning (ML) models have achieved expert-level performance across a range of diagnostic tasks in medical image analysis, yet their adoption in clinical practice remains limited due to concerns over reliability, fairness, and robustness. Two key threats to trustworthy deployment are bias, arising primarily during model development, and data drift, which occurs post-deployment as data distributions change over time. Although conceptually distinct, these two phenomena are often conflated in the literature or addressed in isolation, despite their potential to interact and jointly undermine model performance. We argue that clearly distinguishing between bias and data drift is essential for developing appropriate reliability strategies: methods designed to mitigate bias during training differ fundamentally from those needed to detect and manage drift in deployment. In this survey, we therefore bring these perspectives together within a unified framework, clarifying their boundaries while also highlighting where they intersect. We present a comprehensive review of methods for assessing and monitoring ML reliability in medical image analysis, focusing on disease classification models. We first define and distinguish bias and data drift, illustrate their manifestations in clinical contexts, and categorise their sources. We then review state-of-the-art approaches for bias encoding assessment and data drift detection, as well as methods for estimating model performance degradation when ground truth labels are not immediately available. Our synthesis highlights methodological gaps, particularly in evaluating drift detection techniques on real-world medical data, and outlines open challenges for future research. By consolidating these perspectives and providing accessible explanations for both technical and clinical audiences, this work aims to support collaboration between developers, clinicians, and healthcare institutions in building fair, transparent, and reliable ML systems for clinical use.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Jose_Dolz1
Submission Number: 5810
Loading