BoneMet: An Open Large-Scale Multi-Modal Murine Dataset for Breast Cancer Bone Metastasis Diagnosis and Prognosis
Keywords: Medical Dataset, Breast Cancer Bone Metastasis, Diagnosis, Prognosis, Sparse CT reconstruction, CT, X-ray, Large language model, AI for Science
Abstract: Breast cancer bone metastasis (BCBM) affects women’s health globally, calling
for the development of effective diagnosis and prognosis solutions. While deep
learning has exhibited impressive capacities across various healthcare domains, its
applicability in BCBM diseases is consistently hindered by the lack of an open,
large-scale, deep learning-ready dataset. As such, we introduce the Bone Metastasis
(BoneMet) dataset, the first large-scale, publicly available, high-resolution medical
resource, which is derived from a well-accepted murine BCBM model. The unique
advantage of BoneMet over existing human datasets is repeated sequential scans
per subject over the entire disease development phases. The dataset consists of
over 67 terabytes of multi-modal medical data, including 2D X-ray images, 3D
CT scans, and detailed biological data (e.g., medical records and bone quantitative
analysis), collected from more than five hundreds mice spanning from 2019 to
2024. Our BoneMet dataset is well-organized into six components, i.e., Rotation
X-Ray, Recon-CT, Seg-CT, Regist-CT, RoI-CT, and MiceMediRec. We further
show that BoneMet can be readily adopted to build versatile, large-scale AI models
for managing BCBM diseases in terms of diagnosis using 2D or 3D images, prognosis of bone deterioration, and sparse-angle 3D reconstruction for safe long-term
disease monitoring. Our preliminary results demonstrate that BoneMet has the
potentials to jump-start the development and fine-tuning of AI-driven solutions
prior to their applications to human patients. To facilitate its easy access and
wide dissemination, we have created the BoneMet package, providing three APIs
that enable researchers to (i) flexibly process and download the BoneMet data
filtered by specific time frames; and (ii) develop and train large-scale AI models for
precise BCBM diagnosis and prognosis. The BoneMet dataset is officially available on Hugging Face Datasets at https://huggingface.co/datasets/BoneMet/BoneMet. The BoneMet package is available on the Python Package Index (PyPI) at https://pypi.org/project/BoneMet. Code and tutorials are available at https://github.com/Tiankuo528/BoneMet.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12268
Loading