Keywords: Deep learning, machine learning, histopathology, digital pathology, cancer
Abstract: Training deep neural networks on extremely large inputs—such as gigapixel Whole Slide Images (WSIs) in digital pathology—poses significant challenges due to GPU memory constraints. Multiple Instance Learning (MIL) circumvents this limitation by processing patches from a WSI. However, the encoder used to get patch embeddings is usually a generic pre-trained deep neural network model. In this paper, we propose a training strategy that enables training the encoder by dynamically offloading intermediate activations of a layer to CPU RAM, allowing the layer to process inputs that do not fit in the GPU memory. We demonstrate the effectiveness of our approach on PANDA and CAMELYON datasets using popular MIL approaches. Experimental results indicate that our method improves the Quadratic Weighted Kappa (QWK) metric, on PANDA, by 7–15 percentage points compared to ResNet-18 baselines where encoders are kept frozen. Evaluations on external test sets further suggest better generalisation, and in some configurations, our models even outperform foundation-model encoders on TCGA-PRAD. The code will be made publicly available upon publication.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 20705
Loading