Context-Aware Patch Representations for Multiple Instance Learning

Andreas Lolos; Theofilos Christodoulou; Aris L. Moustakas; Stergios Christodoulidis; Maria Vakalopoulou

Context-Aware Patch Representations for Multiple Instance Learning

Andreas Lolos, Theofilos Christodoulou, Aris L. Moustakas, Stergios Christodoulidis, Maria Vakalopoulou

Published: 02 Jun 2026, Last Modified: 02 Jun 2026Greeks in AI 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Digital Pathology, Multiple Instance Learning, Context Aware Representations

Domains: Vision and Learning, AI for Health

TL;DR: CAPRMIL shifts complexity from the MIL aggregator to context-aware patch representations via soft clustering, matching SOTA performance with up to 92.8% fewer parameters and 99% lower FLOPs.

External Link: https://openreview.net/pdf?id=tnj2zXKMkj

Abstract: In computational pathology, weak supervision has become the standard for deep learning due to the gigapixel scale of WSIs and the scarcity of pixel-level annotations, with Multiple Instance Learning (MIL) established as the principal framework for slide-level model training. In this paper, we introduce CAPRMIL, a novel setting for MIL methods, inspired by advances in Neural Partial Differential Equation (PDE) solvers. Instead of relying on complex attention-based aggregation, we propose an efficient, aggregator-agnostic framework that removes the complexity of correlation learning from the MIL aggregator. CAPRMIL produces rich context-aware patch embeddings that promote effective correlation learning on downstream tasks. By projecting patch features -extracted using a frozen patch encoder- into a small set of global context/morphology-aware tokens and utilizing multi-head self-attention, CAPRMIL injects global context with linear computational complexity with respect to the bag size. Paired with a simple Mean MIL aggregator, CAPRMIL matches state-of-the-art (SOTA) slide-level performance across multiple public pathology benchmarks, while reducing the total number of trainable parameters by 48%-92.8% versus SOTA MILs, lowering FLOPs during inference by 52%-99%, and ranking among the best models on GPU memory efficiency and training time. Our results indicate that learning rich, context-aware instance representations before aggregation is an effective and scalable alternative to complex pooling for whole-slide analysis. Our code is available at: https://github.com/mandlos/CAPRMIL

Submission Number: 7

Loading