Artificial Intelligence–Based Methods for Integrating Local and Global Features for Brain Cancer Imaging: Scoping Review (Preprint)

Hazrat Ali, Rizwan Qureshi, Zubair Shah

Published: 20 Mar 2023, Last Modified: 27 Feb 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Background: Transformer-based models are gaining popularity in medical imaging and cancer imaging applications. Many recent studies have demonstrated the use of transformer-based models for brain cancer imaging applications such diagnosis and tumor segmentation. Objective: This scoping review explores how different vision transformers contributed to advancing the brain cancer diagnosis and tumor segmentation using brain image data. The study examines the different architectures developed for enhancing the task of brain tumor segmentation. Furthermore, it explores how the vision transformer-based models augmented the performance of convolutional neural networks for brain cancer imaging. Methods: This review performed the study search and study selection following the guidelines of PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews). The search comprised of four popular scientific databases: Pubmed, Scopus, IEEEXplore, and Google Scholar. The search terms were formulated to cover the interventions (i.e., vision transformers) and the target application (i.e., brain cancer imaging). The title and abstract for study selection were performed by two reviewers independently and validated by a third reviewer. The data extraction was performed by two reviewers and validated by a third reviewer. Finally, the data were synthesized using a narrative approach. Results: Of the 688 retrieved studies, this review included 22 studies. These studies were published in 2021 and 2022. The most commonly addressed task in these studies was tumor segmentation using vision transformers. No study reported early detection of brain cancer. Among the different vision transformers architectures, SWIN transformer-based architectures have recently become the most popular choice of the research community. Amongst the included architectures, UNETR and TransUNet had the highest number of parameters and thus needed a cluster of as many as eight GPUs for model training. The most popular dataset used in the included studies was the BraTS dataset. Vision transformer was used in different combinations with Convolutional Neural Networks to capture both global and local context of the input brain imaging data. Conclusions: It can be argued that the computational complexity of transformer architectures becomes a bottleneck in advancing the field and enabling clinical transformations. This review provides the current state of knowledge on the topic, and the findings of this review will be helpful for researchers in the field of medical artificial intelligence and its applications in brain cancer.

External IDs:doi:10.2196/preprints.47445