Abstract: Extreme multi-label text classification (XMTC) is the task of tagging each document with the relevant labels in a very large set of predefined category labels. The most challenging part of the problem is due to a highly skewed label distribution where the majority of the categories (namely the tail labels) have very few training instances. Recent benchmark evaluationshave focused on micro-averaging metrics, where the performance on tail labels can be easily overshadowed by that on thehigh-frequency labels (namely the head labels). This paper presents a re-evaluation of state-of-the-art (SOTA) methods based on the binned macro-averaging F1 instead, revealing new insights into the strengths and weaknesses of representative methods, especially in tail label prediction.
0 Replies