Context Matters: Query-aware Dynamic Long Sequence Modeling of Gigapixel Images

Zhengrui Guo; Qichen Sun; Jiabo MA; Lishuang Feng; Jinzhuo Wang; Hao Chen

Context Matters: Query-aware Dynamic Long Sequence Modeling of Gigapixel Images

Zhengrui Guo, Qichen Sun, Jiabo MA, Lishuang Feng, Jinzhuo Wang, Hao Chen

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0

Abstract: Whole slide image (WSI) analysis presents significant computational challenges due to the massive number of patches in gigapixel images. While transformer architectures excel at modeling long-range correlations through self-attention, their quadratic computational complexity makes them impractical for computational pathology applications. Existing solutions like local-global or linear self-attention reduce computational costs but compromise the strong modeling capabilities of full self-attention. In this work, we propose **Querent**, *i.e.*, the **quer**y-awar**e** long co**nt**extual dynamic modeling framework, which achieves a theoretically bounded approximation of full self-attention while delivering practical efficiency. Our method adaptively predicts which surrounding regions are most relevant for each patch, enabling focused yet unrestricted attention computation only with potentially important contexts. By using efficient region-wise metadata computation and importance estimation, our approach dramatically reduces computational overhead while preserving global perception to model fine-grained patch correlations. Through comprehensive experiments on biomarker prediction, gene mutation prediction, cancer subtyping, and survival analysis across over 10 WSI datasets, our method demonstrates superior performance compared to the state-of-the-art approaches. Codes are available at https://github.com/dddavid4real/Querent.

Lay Summary: Medical diagnosis increasingly relies on analyzing enormous microscopic images of tissue samples that contain millions of tiny patches, but current AI systems struggle with these "whole slide images" because they try to compare every patch with every other patch, requiring massive computational power. We developed a new AI approach called "Querent" that works more like human pathologists by intelligently focusing on the most relevant parts of each image rather than analyzing everything at once. Our method divides large images into regions, predicts which regions contain the most important information for each area being examined, and performs detailed analysis only between relevant sections. When tested on over 10 medical datasets for tasks like cancer detection, gene mutation prediction, and patient survival estimation, our approach consistently outperformed existing methods while delivering computational efficiency, which could make advanced AI-powered pathology analysis more accessible to hospitals worldwide, potentially improving diagnostic accuracy and patient care while reducing costs.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/dddavid4real/Querent

Primary Area: Applications->Health / Medicine

Keywords: Computational Pathology, Whole Slide Image, Cancer Diagnosis and Prognosis

Submission Number: 10

Loading