Mitigating Data Imbalance in Medical Report Generation Through Visual Data Resampling

Published: 01 Jan 2024, Last Modified: 06 Jun 2025ICIC (LNBI 2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The generation of accurate medical reports plays an important role in effective healthcare communication and precise patient treatment. However, a significant challenge arises due to the imbalanced distribution, with considerable variation of different diseases within the unhealthy data. This imbalanced data distribution hampers the learning ability of models and results in sub-optimal performance when dealing with rare diseases. In this paper, we propose BERT-VDR, a novel approach that leverages a BERT-based single-stream encoder coupled with a Visual Data Resampling (VDR) module, to mitigate the data imbalance in medical report generation. Specifically, we employ multi-label data resampling (MLSMOTE) to identify the nearest neighbors among minority-class samples and create new instances through linear interpolation. By integrating this approach with a classification task during the pre-training process, we aim to enhance the semantic precision of visual feature representations and mitigate learning performance degradation. Our method's efficacy is validated on two prominent medical imaging datasets, MIMIC-CXR and IU X-Ray. Our method clearly outperforms the baseline model and achieves state-of-the-art results across multiple metrics. Our findings highlight the potential of data resampling in enhancing medical report generation facing imbalanced data distribution.
Loading