Keywords: computational pathology, mitosis detection, breast cancer
TL;DR: Crowd-sourced data, a measure of label uncertainty, and regression models perform well in a study of mitosis detection for computational pathology.
Abstract: Preparing data for machine learning tasks in health and life science applications requires decisions that affect the cost, model properties and performance. In this work, we study the implication of data collection strategies, focusing on a case study of mitosis detection. Specifically, we investigate the use of expert and crowd-sourced labelers, the impact of aggregated vs single labels, and the framing of the problem as either classification or object detection. Our results demonstrate the value of crowd-sourced labels, importance of uncertainty quantification, and utility of negative samples.