Prospectors: Leveraging Short Contexts to Mine Salient Objects in High-dimensional Imagery

Gautam Machiraju; Arjun D Desai; James Zou; Christopher Re; Parag Mallick

Prospectors: Leveraging Short Contexts to Mine Salient Objects in High-dimensional Imagery

Gautam Machiraju, Arjun D Desai, James Zou, Christopher Re, Parag Mallick

Published: 20 Jun 2023, Last Modified: 19 Jul 2023IMLH 2023 PosterEveryoneRevisionsBibTeX

Keywords: Explainable AI, Interpretable ML, visual grounding, Foundation Models

TL;DR: Adaptation can help visually ground image encoders

Abstract: High-dimensional imagery consists of high-resolution information required for end-user decision-making. Due to computational constraints, current methods for image-level classification are designed to train with image chunks or down-sampled images rather than with the full high-resolution context. While these methods achieve impressive classification performance, they often lack visual grounding and, thus, the post hoc capability to identify class-specific, salient objects under weak supervision. In this work, we (1) propose a formalized evaluation framework to assess visual grounding in high-dimensional image applications. To present a challenging benchmark, we leverage a real-world segmentation dataset for post hoc mask evaluation. We use this framework to characterize visual grounding of various baseline methods across multiple encoder classes, exploring multiple supervision regimes and architectures (e.g. ResNet, ViT). Finally, we (2) present prospector heads: a novel class of adaptation architectures designed to improve visual grounding. Prospectors leverage chunk heterogeneity to identify salient objects over long ranges and can interface with any image encoder. We find that prospectors outperform baselines by upwards of +6 balanced accuracy points and +30 precision points in a gigapixel pathology setting. Through this experimentation, we also show how prospectors can enable many classes of encoders to identify salient objects without re-training and also demonstrate their improved performance against classical explanation techniques (e.g. Attention maps).

Submission Number: 80

Loading