Abstract: Multilabel classification is the task of attributing multiple labels to examples via predictions. Current models formulate a reduction of the multilabel setting into either multiple binary classifications or multiclass classification, allowing for the use of existing loss functions (sigmoid, cross-entropy, logistic, etc.). These multilabel classification reductions do not accommodate for the prediction of varying numbers of labels per example. Moreover, the loss functions are distant estimates of the performance metrics. We propose sigmoidF1, a loss function that is an approximation of the F1 score that (i) is smooth and tractable for stochastic gradient descent, (ii) naturally approximates a multilabel metric, and (iii) estimates both label suitability and label counts. We show that any confusion matrix metric can be formulated with a smooth surrogate. We evaluate the proposed loss function on text and image datasets, and with a variety of metrics, to account for the complexity of multilabel classification evaluation. sigmoidF1 outperforms other loss functions on one text and two image datasets over several metrics. These results show the effectiveness of using inference-time metrics as loss functions for non-trivial classification problems like multilabel classification.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: # update of 07.09.2022 - We explicitly acknowledge the multiclass literature on F1 score surrogates in the intro and conclusion - We added a discussion section to report on threshold-moving based methods from (Decubber et al. 2018) and compare to fixed thresholding results. - We added (or when already present, discussed in more detail) the following refs: - (Stijn Decubber, Thomas Mortier, Krzysztof Dembczynski, Willem Waegeman. Deep F-Measure Maximization in Multi-label Classification: A Comparative Study.) - (Dembczynski et al., 2010) (Wydmuch et al., 2018) - soft margin SVM reference - multiclasss non-decomposable performance measures: Narasimhan et al., "Optimizing Non-decomposable Performance Measures: A Tale of Two Classes"; Narasimhan et al., "Consistent Multiclass Algorithms for Complex Performance Measures"; Sanyal et al., "Optimizing non-decomposable measures with deep networks" - Grabocka et al., "Learning Surrogate Losses" - Gai et al., "Gradient-based learning for F-measure and other performance metrics" - (Eban et al., 2017) - We added a discussion on good hyperparameters for image and for text separately in the appendix - Some corrections we did: - Definition 1, remove double conditioning - Table 1, author names rather than years
Assigned Action Editor: ~Aditya_Menon1
Submission Number: 148