Increasing Information Extraction in Low-Signal Regimes via Multiple Instance Learning

Increasing Information Extraction in Low-Signal Regimes via Multiple Instance Learning

ICLR 2026 Conference Submission15863 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multiple Instance Learning, High-Energy Physics, Hypothesis Testing, Fisher Information, Parameter Estimation, Simulation-based Inference

TL;DR: We introduce an information-theoretic view of Multiple Instance Learning for its use in parameter estimation problems in High-Energy Physics, and demonstrate improved performance.

Abstract: In this work, we introduce a new information-theoretic perspective on Multiple Instance Learning (MIL) for parameter estimation with i.i.d. data, and show that MIL can outperform single-instance learners in low-signal regimes. Prior work \citep{nachman_learning_2021} argued that per-instance methods are often sufficient, but this conclusion presumes enough per-instance signal to train near-optimal classifiers. We demonstrate that even state-of-the-art per-instance models can fail to reach optimal classifier performance in challenging low-signal regimes, whereas MIL can mitigate this sub-optimality. As a concrete application, we constrain Wilson coefficients of the Standard Model Effective Field Theory (SMEFT) using kinematic information from subatomic particle collision events at the Large Hadron Collider (LHC). In experiments, we observe that under specific modeling and weak signal conditions, pooling instances can increase the effective Fisher information compared to single-instance approaches.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 15863

Loading