Abstract: We introduce MURA, a large dataset of musculoskeletal radiographs containing 40,562 images from 14,864 studies, where each study is manually labeled by radiologists as either normal or abnormal. On this dataset, we train a 169-layer densely connected convolutional network to detect and localize abnormalities. To evaluate our model robustly and to get an estimate of radiologist performance, we collect additional labels from six board-certified Stanford radiologists on the test set, consisting of 207 musculoskeletal studies. On this test set, the majority vote of a group of three radiologists serves as gold standard. The model achieves an AUROC of 0.929, with an operating point of 0.815 sensitivity and 0.887 specificity. We also compare our model and radiologists on the Cohen's kappa statistic, which expresses the agreement of our model and of each radiologist with the gold standard. We find that our model achieves performance comparable to that of radiologists. Model performance is comparable to the best radiologist performance in detecting abnormalities on finger and wrist studies. However, model performance is lower than best radiologist performance in detecting abnormalities on elbow, forearm, hand, humerus, and shoulder studies, indicating that the task is a good challenge for future research. To encourage advances, we have made our dataset freely available at http://stanfordmlgroup.github.io/competitions/mura.
Keywords: Convolutional Neural Network, Deep Learning, Musculoskeletal Radiographs, Abnormality Detection
Author Affiliation: Stanford University Department of Computer Science, Stanford University Department of Medicine, Stanford University Department of Radiology