Sci2Txt: Automated report generation of low-resolution SPECT bone scintigrams using spatial-position-aware and hierarchical features

Published: 01 Jan 2026, Last Modified: 25 Sept 2025Biomed. Signal Process. Control. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In clinical practice, writing diagnostic reports from low-resolution, large-scale bone scintigrams poses a significant burden on nuclear medicine physicians. While deep learning-based automated report generation has shown promise in reducing diagnostic oversights and alleviating physician workload, most existing methods are tailored for high-resolution X-ray images. However, bone scintigrams exhibit unpredictable characteristics in location, size, and shape, which complicates accurate report generation. To address this challenge, we introduce Sci2Txt, a novel encoder-decoder architecture for the automated generation of diagnostic reports from bone scintigrams. Sci2Txt incorporates three innovative components: (1) a Spatial-Position Visual Feature Extractor (SPVFE) that captures multi-scale spatial position information from low-resolution images; (2) a Hierarchical Fusion Encoder (HFE) that integrates low- and high-level semantic features through cross-level feature splicing and nonlinear transformations; and (3) a Memory-driven Transformer Decoder (MTD) that generates coherent and clinically accurate reports. Evaluated on a dataset of 2,091 clinical bone scintigrams, Sci2Txt outperforms existing methods in both traditional natural language generation metrics and the newly proposed hierarchical Clinical Efficacy (CE) metric. By enabling efficient and accurate automated reporting, Sci2Txt offers a practical solution for enhancing diagnostic workflows in the detection of bone metastases.
Loading