Abstract: Hierarchical Text Classification (HTC) aims to cate-gorize text data based on a structured label hierarchy, generating predicted labels that form a local hierarchical structure. Previous approaches have employed various methods to integrate text and label semantics, but they often overlooked the importance of local hierarchical context. By considering the local hierarchy, which encapsulates relationships between labels within the context of individual samples, we can enhance the association between text and its related labels. To address this, we propose a Margin Separation Loss (MSL), which explicitly models text-label semantic associations in a local hierarchy-aware manner. We obtain positive labels for each sample using the local hierarchy and employ the global hierarchy to identify corresponding negative labels. Positive and negative pairs are created by pairing the text sample with its positive and negative labels. MSL enforces a margin between positive and negative pairs at each hierarchical level, which ensures that similarity within positive pairs is maximized while similarity within negative pairs is minimized in the embedding space, thereby aligning text representations with their related labels. Building upon this, we introduce the Hierarchical Text-Label Association $(\mathbf{HTLA}^{\mathrm{n}})$ model, utilizing BERT for text encoding and a customized Graphormer to encode label hierarchy and fusion of text-label embeddings to generate composite representations. Experimental results on benchmark datasets and comparison with existing baselines demonstrate the effectiveness of HTLAnfor HTC.
Loading