Abstract: The remarkable performance of the Transformer architecture in natural language processing has recently also triggered broad
interest in Computer Vision. Among other merits, Transformers are witnessed as capable of learning long-range dependencies and spatial correlations, which is a clear advantage over convolutional neural networks (CNNs), which have been the de
facto standard in Computer Vision problems so far. Thus, Transformers have become an integral part of modern medical image
analysis. In this review, we provide an encyclopedic review of the applications of Transformers in medical imaging. Specifically, we present a systematic and thorough review of relevant recent Transformer literature for different medical image analysis
tasks, including classification, segmentation, detection, registration, synthesis, and clinical report generation. For each of these
applications, we investigate the novelty, strengths and weaknesses of the different proposed strategies and develop taxonomies
highlighting key properties and contributions. Further, if applicable, we outline current benchmarks on different datasets. Finally,
we summarize key challenges and discuss different future research directions.
Loading