Can Frozen Transformer in Large Language Model Help with Medical Image Segmentaion?

Published: 27 Apr 2024, Last Modified: 27 Apr 2024MIDL 2024 Short PapersEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Medical image segmentation, LLMs, Frozen transformers, Trans-UNet
Abstract: Transformer models shine in medical image segmentation by harnessing their self-attention mechanism to capture both global information, thus boosting segmentation accuracy. Recent research has unveiled that large language models (LLMs), trained solely on text, surprisingly excel at visual tasks even without language, through a simple strategy: integrating a frozen transformer block from pre-trained LLMs as a direct visual token processor. This paper applies this approach to medical image segmentation by combining frozen transformer blocks with Trans-UNet. Experiments are conducted on BTCV, ACDC, ISIC 2017, CVC-ClinicDB, and BUSI datasets, demonstrating some improvements.
Submission Number: 180
Loading