Improvement of TransUNet Using Word Patches Created from Different Dataset

Ayato Takama, Satoshi Kamiya, Kazuhiro Hotta

Published: 2024, Last Modified: 02 Apr 2026ICPRAM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: UNet is widely used in medical image segmentation, but it cannot extract global information sufficiently. On the other hand, TransUNet achieves better accuracy than conventional UNet by combining a CNN, which is good at local features, and a Transformer, which is good at global features. In general, TransUNet requires a large amount of training data, but there are constraints on training images in the medical area. In addition, the encoder of TransUNet uses a pre-trained model on ImageNet consisted of natural images, but the difference between medical images and natural images is a problem. In this paper, we propose a method to learn Word Patches from other medical datasets and effectively utilize them for training TransUNet. Experiments on the ACDC dataset containing 4 classes of 3D MRI images and the Synapse multi-organ segmentation dataset containing 9 classes of CT images show that the proposed method improved the accuracy even with small training data, and we showed that the per