Abstract: Classical multiple instance learning (MIL) methods are often based on the identical and independent distributed assumption between instances, hence neglecting the potentially rich contextual information beyond individual entities. On the other hand, Transformers with global self-attention modules have been proposed to model the interdependen- cies among all instances. However, in this paper we question: Is global re- lation modeling using self-attention necessary, or can we appropriately re- strict self-attention calculations to local regimes in large-scale whole slide images (WSIs)? We propose a general-purpose local attention graph- based Transformer for MIL (LA-MIL), introducing an inductive bias by explicitly contextualizing instances in adaptive local regimes of arbitrary size. Additionally, an efficiently adapted loss function enables our ap- proach to learn expressive WSI embeddings for the joint analysis of mul- tiple biomarkers. We demonstrate that LA-MIL achieves state-of-the-art results in mutation prediction for gastrointestinal cancer, outperforming existing models on important biomarkers such as microsatellite instabil- ity for colorectal cancer. Our findings suggest that local self-attention suf- ficiently models dependencies on par with global modules. Our LA-MIL implementation is available at https://github.com/agentdr1/LA_MIL
Loading