RaTEScore: A Metric for Entity-Aware Radiology Text Similarity

ACL ARR 2024 June Submission4977 Authors

16 Jun 2024 (modified: 06 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper proposes a new entity-aware lightweight metric for assessing accuracy of generated medical free-form text from AI models. Our metric, termed as Radiological Report Text Evaluation (RaTEScore), is designed to focus on key medical entities, such as diagnostic outcomes, anatomies, while demonstrating robustness against complex medical synonyms and sensitivity to negation expressions. Technically, we establish a new large-scale medical NER dataset RaTE-NER and train an NER model on it. Leveraging it, we decompose complex radiological reports into medical entities. We define the final metric by comparing the similarity based on the entity embeddings computed from language model and their corresponding types, forcing the metrics to focus on clinically critical statements. In experiments, our score demonstrates superior performance on aligning with human preference than other metrics, both on the existing public benchmarks and our new proposed RaTE-Eval benchmark.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Clinical NLP; metrics; automatic evaluation
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 4977
Loading