Abstract: This paper considers binary and multilabel classification problems in a
setting where labels are missing independently and with a known rate. Missing
labels are a ubiquitous phenomenon in extreme multi-label classification (XMC)
tasks, such as matching Wikipedia articles to a small subset out of the
hundreds of thousands of possible tags, where no human annotator can possibly
check the validity of all the negative samples. For this reason,
propensity-scored precision---an unbiased estimate for precision-at-k under a
known noise model---has become one of the standard metrics in XMC. Few
methods take this problem into account already during the training phase, and
all of these are limited to loss functions that can be decomposed into a sum
of contributions from each individual label. A typical approach to training is
to reduce the multilabel problem into a series of binary or multiclass
problems, and it has been shown that if the surrogate task should be
consistent for optimizing recall, the resulting loss function is not
decomposable over labels. Therefore, this paper develops unbiased estimators
for generic, potentially non-decomposable loss functions. These estimators
suffer from increased variance and may lead to ill-posed optimization
problems, which we address by switching to convex upper-bounds. The
theoretical considerations are further supplemented by an experimental study
showing that the switch to unbiased estimators significantly alters the
bias-variance trade-off and may thus require stronger regularization.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Takashi_Ishida1
Submission Number: 4497
Loading