Abstract: Commonly used evaluation metrics in multi-label learning all involve base loss functions, and the theoretical guarantees of multi-label learning often rely on the properties of base loss functions. Some recent theoretical works have used the Lipschitz continuity of base loss functions to prove the generalization bounds for multi-label learning, but the impact of the smoothness of base loss functions on the generalization bounds is completely unknown. In an attempt to make up for this gap in the generalization theory of multi-label learning, we develop some novel vector-contraction inequalities for smooth base loss functions and derive tight generalization bounds with no dependency on the number of labels, up to logarithmic terms. We then exploit local Rademacher complexity to develop some novel local vector-contraction inequalities for smooth base loss functions, which induce generalization bounds with a tighter dependency on the number of labels and a faster convergence rate with respect to the number of examples. In addition, we derive tight generalization bounds with no dependency on the number of labels, up to logarithmic terms, for Macro-Averaged AUC by exploiting the Lipschitz continuity and smoothness of base loss functions, respectively. Our state-of-the-art theoretical results provide general theoretical guarantees for the generalization of multi-label learning.
Lay Summary: Multi-label learning is one of the most studied and important machine learning paradigms in practice, in which each object is represented by a single instance while being associated with a set of labels. Some recent theoretical works have made preliminary explorations into the generalization of multi-label learning, however, the establishment of faster convergence rates with respect to the number of examples of the generalization bounds and the impact of the smoothness of base losses on the generalization bounds remain unexplored. We deeply explore the smoothness property of base losses, and develop novel vector-contraction inequalities to induce tight bounds with no dependency on the number of labels. We further exploit local Rademacher complexity and develop novel local vector-contraction inequalities to induce bounds with no dependency on the number of labels and a faster rate with respect to the number of examples. In addition, we derive tight bounds with no dependency on the number of labels for Macro-Averaged AUC with both Lipschitz and smooth base losses. Our theoretical analysis induces tighter and faster bounds and reveals the impact of smooth base losses on the generalization.
Primary Area: Theory->Learning Theory
Keywords: Multi-Label Learning, Generalization Bound
Submission Number: 14018
Loading