Synthetic Data Reveals Generalization Gaps in Correlated Multiple Instance Learning

Ethan Harvey; Dennis Johan Loevlie; Michael C Hughes

Synthetic Data Reveals Generalization Gaps in Correlated Multiple Instance Learning

Ethan Harvey, Dennis Johan Loevlie, Michael C Hughes

Published: 27 Nov 2025, Last Modified: 09 Dec 2025ML4H 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multiple instance learning, Bayes estimator

TL;DR: We demonstrate the limitations of off-the-shelf MIL approaches by quantifying their performance compared to the optimal Bayes estimator on a novel synthetic dataset.

Track: Findings

Abstract: Multiple instance learning (MIL) is often used in medical imaging to classify high-resolution 2D images by processing patches or classify 3D volumes by processing slices. However, conventional MIL approaches treat instances separately, ignoring contextual relationships such as the appearance of nearby patches or slices that can be essential in real applications. We design a synthetic classification task where accounting for adjacent instance features is crucial for accurate prediction. We demonstrate the limitations of off-the-shelf MIL approaches by quantifying their performance compared to the optimal Bayes estimator for this task, which is available in closed-form. We empirically show that newer correlated MIL methods still do not achieve the best possible performance when trained with ten thousand training samples, each containing many instances.

General Area: Models and Methods

Specific Subject Areas: Bayesian & Probabilistic Methods, Evaluation Methods & Validity

Data And Code Availability: Yes

Ethics Board Approval: No

Entered Conflicts: I confirm the above

Anonymity: I confirm the above

Submission Number: 50

Loading