RaTEScore: A Metric for Entity-Aware Radiology Text Similarity

Abstract: This paper proposes a new entity-aware lightweight metric for assessing accuracy of generated medical free-form text from AI models. Our metric, termed as Radiological Report Text Evaluation (RaTEScore), is designed to focus on key medical entities, such as diagnostic outcomes, anatomies, while demonstrating robustness against complex medical synonyms and sensitivity to negation expressions. Technically, we establish a new large-scale medical NER dataset RaTE-NER and train an NER model on it. Leveraging it, we decompose complex radiological reports into medical entities. We define the final metric by comparing the similarity based on the entity embeddings computed from language model and their corresponding types, forcing the metrics to focus on clinically critical statements. In experiments, our score demonstrates superior performance on aligning with human preference than other metrics, both on the existing public benchmarks and our new proposed RaTE-Eval benchmark.
