Bone Anomaly Identification and Localization from Musculoskeletal X-ray images using Hybrid DenseNet-Vision Transformers.

02 Dec 2025 (modified: 15 Dec 2025)MIDL 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Musculoskeletal anomalies, Fracture, Arthritis, Visual transformers
Abstract: Musculoskeletal anomalies present a global health challenge, often requiring expert review for diagnosis. Automated detection systems can alleviate this burden and facilitate timely intervention. We introduce a hybrid framework combining DenseNet and Vision Transformers (ViT) for detecting anomalies in the MURA dataset. This model uses CNNs for local feature extraction and ViT for global context modeling, effectively capturing the subtle features of bone pathologies. Our binary model achieved 92% accuracy across seven bone types, demonstrating strong generalizability through cross-anatomy testing. To enhance fine-grained annotations, we collaborated with an orthopedic expert to label 1,379 finger radiographs into three sub-anomaly categories: arthritis, fracture, and implant, with bounding boxes. This framework reached 96% classification accuracy and 91% localization efficiency (AUC), marking the first instance of sub-anomaly classification on MURA finger data with bounding box predictions providing visual validation for clinicians. Additionally, we also analyze the importance of positional encodings in ViT, showing their necessity for localization but not for binary classification. This work offers a new annotated dataset and a robust framework, advancing automated orthopedic anomaly detection.
Primary Subject Area: Detection and Diagnosis
Secondary Subject Area: Application: Other
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 227
Loading