Evaluation beyond y and p(y)Download PDF

21 Apr 2022, 01:05 (modified: 04 Jun 2022, 12:07)MIDL 2022 Short PapersReaders: Everyone
Keywords: Evaluation, AI in practice, hardware, software
TL;DR: This paper describes several metrics of AI models that do not focus only in the predictions and labels, we encourage researchers to consider these during development.
Abstract: Academic papers and challenges focus mostly on metrics that measure how well a model's output p(y) approximates labels y. However, a high performance based on these metrics is not a sufficient condition for a practically useful model. Looking into the complexity of a model both in terms of hardware and software can shed more light on the practical merit. This short paper discusses several measures for medical AI system that do not focus solely on labels and predictions. We encourage the research community to consider these metrics more often.
