Multi-granularity Semantic Guided Transformer for Radiology Report Generation

Published: 01 Jan 2024, Last Modified: 20 Feb 2025NLPCC (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Radiology Report Generation aims to generate accurate diagnostic reports based on medical images. Existing approaches based on the Transformer paradigm and grid features had achieved significant performance. However, this paradigm inevitably loses fine-grained visual representations and ignores multi-level semantic information. Therefore, in this paper, we propose a Semantic Aware and Attention Refine Transformer (SA3RT) model to enhance ability of radiology report generation by utilizing multi-granularity semantic information. Specifically, the semantic-aware unsupervised region recognition module relies on clustering algorithms to efficiently and effectively utilize grid features to make the model focus on global-local visual representations. In multi-level fashion, where different layers learn complementary semantic information, the attention-aware refinement module exploits the semantic relationships between tokens of multi-level attention-aware to fuse low- and high-level semantic information. Experiments are performed on four radiology report generation datasets, COV-CTR, COVID-19 CT, IU X-Ray and MIMIC-CXR. The experiments show that the SA3RT model achieves results competitive with state-of-the-art methods, and the related experiments also demonstrate that the SA3RT model has a strong generalization capability for radiology report generation in different disease domains. Code is available at https://github.com/Xiaojin-Hua/SA3RT.
Loading