Abstract: Medical image superresolution models are typically trained on synthetic low-resolution/high-resolution image pairs. This is easier and does not reflect the real-world challenge of learning superresolution for clinical CT scans from micro-CT. Furthermore, most medical image superresolution models are evaluated using standard image similarity metrics like Peak-Signal-to-Noise-Ratio (PSNR) and Structural Similarity Index Measure (SSIM), even though it is well known, that they favor image characteristics that are not suitable in a medical image setting. In this paper, we investigate how training on real or synthetic data affects model performance. We also evaluate how models perform, when using measures that are relevant for clinical bone analysis. We train two well-established superresolution models, SRGAN and ESRGAN on two real-world multiscale human bone CT datasets. We compare this approach with training the models on synthetic data, where low-resolution images are produced by subsampling and blurring high-resolution images, as is the common approach when assessing superresolution architectures for medical images. When evaluating performance, we calculate both clinically relevant bone measures, Bone Volume Fraction, Thickness and Degree of Anisotropy as well as PSNR and SSIM. We find that training on real data generally makes the model perform worse than when training on synthetic data. This suggests that more can be learned from using real-world data when exploring medical image superresolution. It also seems that the similarity metrics do not always align with the bone measures in terms of lowest error, indicating that including the bone measures in our evaluation gives a fuller picture of how our models perform.
Loading