Transformer Based Multi-view Network for Mammographic Image Classification

Zizhao Sun, Huiqin Jiang, Ling Ma, Zhan Yu, Hongwei Xu

2022 (modified: 05 Nov 2022)MICCAI (3) 2022Readers: Everyone

Abstract: Most of the existing multi-view mammographic image analysis methods adopt a simple fusion strategy: features concatenation, which is widely used in many features fusion methods. However, concatenation based methods can’t extract cross view information very effectively because different views are likely to be unaligned. Recently, many researchers have attempted to introduce attention mechanism related methods into the field of multi-view mammography analysis. But these attention mechanism based methods still partly rely on convolution, so they can’t take full advantages of attention mechanism. To take full advantage of multi-view information, we propose a novel pure transformer based multi-view network to solve the question of mammographic image classification. In our primary network, we use a transformer based backbone network to extract image features, a “cross view attention block” structure to fuse multi-view information, and a “classification token” to gather all useful information to make the final prediction. Besides, we compare the performance when fusing multi-view information at different stages of the backbone network using a novel designed “(shifted) window based cross view attention block” structure and compare the results when fusing different views’ information. The results on DDSM dataset show that our networks can effectively use multi-view information to make judgments and outperform the concatenation and convolution based methods.

0 Replies