\section{Conclusion}
Our novel approach adapts video-based transfer learning through task-specific self-supervised domain adaptation, successfully enabling Vision Transformers to address vertebral fracture detection in 3D CT scans. This study not only advances the state-of-the-art in vertebral fracture detection but also showcases the potential of task-specific pretraining for other medical image analysis tasks. Future research could explore the creation of task-specific pretraining datasets for various applications and evaluate the generalizability of our approach. By addressing the challenge of limited data in medical image analysis, our work offers a promising solution for improving patient care through accurate, interpretable, yet resource-efficient methods for advancing clinical decision-making support.