Keywords: Mammography dataset, ordinal classification, breast cancer
TL;DR: We release and benchmark on CSAW-M, a large public mammography database annotated for masking potential.
Abstract: Interval and large invasive breast cancers, which are associated with worse prognosis than other cancers, are usually detected at a late stage due to false negative assessments of screening mammograms. The missed screening-time detection is commonly caused by the tumor being obscured by its surrounding breast tissues, a phenomenon called masking. To study and benchmark mammographic masking of cancer, in this work we introduce CSAW-M, the largest public mammographic dataset, collected from over 10,000 individuals and annotated with potential masking. In contrast to the previous approaches which measure breast image density as a proxy, our dataset directly provides annotations of masking potential assessments from five specialists. We also trained deep learning models on CSAW-M to estimate the masking level and showed that the estimated masking is significantly more predictive of screening participants diagnosed with interval and large invasive cancers -- without being explicitly trained for these tasks -- than its breast density counterparts.
Supplementary Material: zip
URL: Our dataset can be accessed using the DOI: 10.17044/scilifelab.14687271. Access to the dataset files are given upon agreeing to the terms and sending a request. We note that the dataset webpage is self-contained, that is, all the data and metadata files needed for the user to understand how to use the data are available in the same place.
Contribution Process Agreement: Yes
Dataset Url: The dataset is available at: https://doi.org/10.17044/scilifelab.14687271 For information about accessing the dataset, please refer to the webpage.
License: Please refer to the dataset webpage for the license.
Author Statement: Yes