Abstract: Automatic medical report generation has garnered considerable research attention owing to its practical significance in alleviating the workload burden on radiologists. Despite the promise shown by existing methods, limitations still persist due to their tendency to overlook the multi-view or high-resolution characteristics of medical images, which contain a wealth of rich and diverse information. To tackle this issue, we propose a Multi-Slice Fusion report generation framework (ReFuGen). In detail, ReFuGen utilizes a Multi-slice Feature Extractor to extract features from multiple image slices. Then, we introduce the Exemplary Slice Amplifier, which adaptively identifies important slices, applies weighting based on their contribution, and generates enhanced features by integrating spatial features from all slices. Next, the Adaptive Receptive Field Seq-Enhancer is proposed to enhance feature interactions across different channels by adaptively using a series of 1D convolutions. Lastly, we introduce the Text Knowledge Integrator, which integrates textual knowledge to address the issue of sparse features in medical images. Experimentally, we outperform state-of-the-art methods on three widely used benchmark datasets: IU X-Ray, MIMIC-CXR and FFA-IR.
External IDs:dblp:conf/bibm/BuLD23
Loading