Generalization and Learnability in Multiple Instance Regression

Kushal Chauhan; Rishi Saket; Lorne Applebaum; Ashwinkumar Badanidiyuru; Chandan Giri; Aravindan Raghuveer

Generalization and Learnability in Multiple Instance Regression

Kushal Chauhan, Rishi Saket, Lorne Applebaum, Ashwinkumar Badanidiyuru, Chandan Giri, Aravindan Raghuveer

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multiple instance regression, generalization, inapproximability

TL;DR: Generalization and inapproximability results and model training algorithm for multiple instance regression

Abstract: Multiple instance regression (MIR) was introduced by Ray and Page (2001) as an analogue of multiple instance learning (MIL) in which we are given bags of feature-vectors (instances) and for each bag there is a bag-label which matches the label of one (unknown) primary instance from that bag. The goal is to compute a hypothesis regressor consistent with the underlying instance-labels. A natural approach is to find the best primary instance assignment and regressor optimizing the mse loss on the bags though no formal generalization guarantees were known. Our work is the first to prove generalization error bounds for MIR when the bags are drawn i.i.d. at random. Essentially, with high probability any MIR regressor with low error on sampled bags also has low error on the underlying instance-label distribution. We next study the complexity of linear regression on MIR bags, shown to be NP-hard in general by Ray and Page (2001), who however left open the possibility of arbitrarily good approximations. Significantly strengthening previous work, we prove a strong inapproximability bound: even if there exists zero bag-loss MIR linear regressor on a collection of $2$-sized bags with labels in $[-1,1]$, it is NP-hard to find an MIR linear regressor with bag-loss $< C$ for some absolute constant $C > 0$. Our work also proposes a model training method for MIR based on a novel weighted assignment loss, geared towards handling overlapping bags which have not received much attention previously. We conduct empirical evaluations on synthetic and real-world datasets showing that our method outperforms the baseline MIR methods.

List Of Authors: Chauhan, Kushal and Saket, Rishi and Applebaum, Lorne and Badanidiyuru, Ashwinkumar and Giri, Chandan and Raghuveer, Aravindan

Latex Source Code: zip

Signed License Agreement: pdf

Code Url: https://github.com/google-research/google-research/tree/master/mir_uai24

Submission Number: 709

Loading