Abstract: In medical imaging, deep learning has greatly enhanced image quality, however, issues on the generalization of network performance for datasets of untrained characteristics (i.e., out-of-distribution data) have risen. In this paper, we demonstrate that this issue of generalization creates an unforeseen obstacle in performance comparison among neural networks. The issue originates when there is no public dataset available, which is a common condition in medical imaging due to cost and regulation. As a result, many of the current studies use their own private dataset for network training and evaluation. This popular but unscrutinized practice implies that the performance comparison among networks is often achieved using datasets of different characteristics, raising questions about fairness in comparison. In this study, we investigate the issue of fair comparison in deep learning-based quantitative susceptibility mapping (QSM). Two different comparison options are considered: comparison with a pre-trained network and comparison with a re-trained network. The results demonstrate that both options can lead to unfair comparison because of differences in the characteristics of private datasets or performance differences between the re-trained network and the original network. Additionally, we report that deep neural networks have reproducibility issues that potentially undermine a fair comparison even when the same dataset is utilized. These results are well-known but often overlooked in performance comparisons, highlighting the issue of fair comparison in current practice. This paper suggests that a public dataset and a trained network with information for reproducibility are necessary for a fair comparison of deep learning-based QSM.
External IDs:doi:10.1109/access.2025.3620427
Loading