Rethinking Texture Patterns in Transformer Neural NetWork for Medical Image Analysis

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: machine learning, deep learning, Transformer, texture, lesion differentiatioin
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Lesion identification has been known as a major purpose in computer-aided diagnosis (CADs) and one of key tasks in radiomics. This study aims to explore the potential of transformer neural network by introducing texture patterns and features to fine-tune the learning model for lesion differentiation from the benign tissues. We proposed texture transformer network (TxTN) by integrating three texture layers in vision transformer (ViT) to enhance the discriminative capability for medical image analysis. This inspirational idea is stemmed from one important insight into the architecture of ViT and its major shortcomings including topological destruction, the loss of geometrical information and the lack of global characteristics. By considering the definition and the property of image texture, ViT and texture pattern have a strong complementary relevance since the locality and globality are two basic requirements of the latter. Moreover, many well-known texture patterns have very good embeddability in attention mechanism since they are always represented by vectors or matrix, such as gray level co-occurrence matrix (GLCM)and histogram. Hereafter, we figured out a practical way to combine them by developing some texture pattern layers and a histogram layer to embed into transformer network as the substitute of the pixel projection layer which is the major stem of above drawbacks. Their combinations not only make full use of advantages of texture and ViT but also have strong potentials to fine-tune the deep learning models by mining more heterogeneous properties from patterns instead of pixels in various imaging modalities. Therefore, many texture patterns could be re-used in our approach, such as gray level co-occurrence matrix (GLCM), vector quantization (VQ), and so on. In the preliminary study, our approach selected three texture patterns into TxTN, i.e. GLCM, VQ and Laplacian. To evaluate the effectiveness of our approach, our approach is finally testified over two public medical datasets and demonstrated very striking performances.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1673
Loading