Unbiased Estimates for Multilabel Reductions of Extreme Classification with Missing LabelsDownload PDF

16 May 2022 (modified: 05 May 2023)NeurIPS 2022 SubmittedReaders: Everyone
Abstract: This paper considers the missing-labels problem in the extreme multilabel classification (XMC) setting, i.e. a setting with a very large label space. The goal in XMC often is to maximize either precision or recall of the top-ranked predictions, which can be achieved by reducing the multilabel problem into a series of binary (One-vs-All) or multiclass (Pick-all-Labels) problems. Missing labels are a ubiquitous phenomenon in XMC tasks, yet the interaction between missing labels and multilabel reductions has hitherto only been investigated for the case of One-vs-All reduction. In this paper, we close this gap by providing unbiased estimates for general (non-decomposable) multilabel losses, which enables unbiased estimates of the Pick-all-Labels reduction, as well as the normalized reductions which are required for consistency with the recall metric. We show that these estimators suffer from increased variance and may lead to ill-posed optimization problems. To address this issue, we propose to use convex upper bounds which trade off an increase in bias against a strong decrease in variance.
Supplementary Material: zip
13 Replies

Loading