Learning to Abstain from Uninformative Data

Yikai Zhang; Songzhu Zheng; Mina Dalirrooyfard; Pengxiang Wu; Anderson Schneider; Yuriy Nevmyvaka; Chao Chen

Learning to Abstain from Uninformative Data

Yikai Zhang, Songzhu Zheng, Mina Dalirrooyfard, Pengxiang Wu, Anderson Schneider, Yuriy Nevmyvaka, Chao Chen

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: PAC Learning, Sample Complexity, Selective Learning, Uninformative Data

TL;DR: Learning with data contain majority uninformative data with selective loss

Abstract: Learning and decision making in domains with naturally high noise-to-signal ratios – such as Finance or Healthcare – can be challenging yet extremely important. In this paper, we study a problem of learning and decision making under a general noisy generative process. The distribution has a significant proportion of uninformative data with high noise in label, while part of the data contains useful information represented by low label noise. This dichotomy is present during both training and inference, which requires the proper handling of uninformative data at testing time. We propose a novel approach to learn under these conditions via a loss inspired by the selective learning theory. By minimizing the loss, the model is guaranteed to make a near-optimal decision by distinguishing informative data from the uninformative data and making predictions. We build upon the strength of our theoretical guarantees by describing an iterative algorithm, which jointly optimizes both a predictor and a selector, and evaluate its empirical performance under a variety of settings.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Theory (eg, control theory, learning theory, algorithmic game theory)

21 Replies

Loading