Keywords: Evaluation, AI in practice, hardware, software
TL;DR: This paper describes several metrics of AI models that do not focus only in the predictions and labels, we encourage researchers to consider these during development.
Abstract: Academic papers and challenges focus mostly on metrics that measure how well a model's output p(y) approximates labels y. However, a high performance based on these metrics is not a sufficient condition for a practically useful model. Looking into the complexity of a model both in terms of hardware and software can shed more light on the practical merit. This short paper discusses several measures for medical AI system that do not focus solely on labels and predictions. We encourage the research community to consider these metrics more often.
Registration: I acknowledge that acceptance of this work at MIDL requires at least one of the authors to register and present the work during the conference.
Authorship: I confirm that I am the author of this work and that it has not been submitted to another publication before.
Paper Type: novel methodological ideas without extensive validation
Primary Subject Area: Validation Study
Secondary Subject Area: Application: Other
Confidentiality And Author Instructions: I read the call for papers and author instructions. I acknowledge that exceeding the page limit and/or altering the latex template can result in desk rejection.