Abstract: Color normalization is one of the main tasks in the processing pipeline of computer-aided diagnosis (CAD) systems in histopathology. This task reduces the color and intensity variations that are typically present in stained whole-slide images (WSI) due to, e.g., non-standardization of staining protocols. Moreover, it increases the accuracy of machine learning (ML) based CAD systems. Given the vast amount of gigapixel-sized WSI data, and the need to reduce the time-to-insight, there is an increasing demand for efficient ML systems. In this work, we present a high-performance pipeline that enables big data analytics for WSIs in histopathology. As an exemplary ML inference pipeline, we employ a convolutional neural network (CNN), used to detect prostate cancer in WSIs, with stain normalization preprocessing. We introduce a set of optimizations across the whole pipeline: (i) we parallelize and optimize the stain normalization process, (ii) we introduce a multi-threaded I/O framework optimized for fast non-volatile memory (NVM) storage, and (iii) we integrate the stain normalization optimizations and the enhanced I/O framework in the ML pipeline to minimize the data transfer overheads and the overall prediction time. Our combined optimizations accelerate the end-to-end ML pipeline by $$7.2{\times }$$ and $$21.2{\times }$$ , on average, for low and high resolution levels of WSIs, respectively. Significantly, it allows for a seamless integration of the ML-assisted diagnosis with state-of-the-art whole slide scanners, by reducing the prediction time for high-resolution histopathology images from $$\sim $$ 30 min to under 80 s.
0 Replies
Loading