FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark

Mingjie Li; Wenjia Cai; Rui Liu; Yuetian Weng; Xiaoyun Zhao; Cong Wang; Xin Chen; Zhong Liu; Caineng Pan; Mengke Li; yingfeng zheng; Yizhi Liu; Flora D. Salim; Karin Verspoor; Xiaodan Liang; Xiaojun Chang

FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark

Mingjie Li, Wenjia Cai, Rui Liu, Yuetian Weng, Xiaoyun Zhao, Cong Wang, Xin Chen, Zhong Liu, Caineng Pan, Mengke Li, yingfeng zheng, Yizhi Liu, Flora D. Salim, Karin Verspoor, Xiaodan Liang, Xiaojun Chang

Published: 11 Oct 2021, Last Modified: 23 May 2023NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone

Keywords: medical report generation, vision and language, fundus fluorescein angiography, explaibable and reliable evaluation

TL;DR: Towards explainable and reliable MRG benchmark based on fundus fluorescein angiography images and reports, which provides explainable annotations and reliable evaluation tools to facilitate the developmet of medical report generation methods.

Abstract: The automatic generation of long and coherent medical reports given medical images (e.g. Chest X-ray and Fundus Fluorescein Angiography (FFA)) has great potential to support clinical practice. Researchers have explored advanced methods from computer vision and natural language processing to incorporate medical domain knowledge for the generation of readable medical reports. However, existing medical report generation (MRG) benchmarks lack both explainable annotations and reliable evaluation tools, hindering the current research advances from two aspects: firstly, existing methods can only predict reports without accurate explanation, undermining the trustworthiness of the diagnostic methods; secondly, the comparison among the predicted reports from different MRG methods is unreliable using the evaluation metrics of natural-language generation (NLG). To address these issues, in this paper, we propose an explainable and reliable MRG benchmark based on FFA Images and Reports (FFA-IR). Specifically, FFA-IR is large, with 10,790 reports along with 1,048,584 FFA images from clinical practice; it includes explainable annotations, based on a schema of 46 categories of lesions; and it is bilingual, providing both English and Chinese reports for each case. Besides using the widely used NLG metrics, we propose a set of nine human evaluation criteria to evaluate the generated reports. We envision FFA-IR as a testbed for explainable and reliable medical report generation. We also hope that it can broadly accelerate medical imaging research and facilitate interaction between the fields of medical imaging, computer vision, and natural language processing.

Open Credentialized Access: Our dataset is hosted on the Physionet with the following link: https://doi.org/10.13026/ccbh-z832

URL: Code: https://github.com/mlii0117/FFA-IR

9 Replies

Loading