Bridging Structure and Semantics via Path-level Alignment for Hierarchical Multi-Label Text Classification
Keywords: hierarchical multi-label text classification, representation learning
Abstract: Hierarchical multi-label text classification (HMTC) aims to assign documents with multiple labels organized in a predefined hierarchy, posing challenges for modeling both the hierarchical structure and the fine-grained label semantics. Existing approaches often rely on hierarchy-specific prediction architectures or hard consistency constraints, which can limit flexibility and robustness, especially for deep and imbalanced hierarchies. In this work, we propose a hierarchy-aware representation learning framework that reformulates HMTC as a path-level semantic alignment problem. We introduce PathSimNCE, a hierarchy-aware contrastive objective that aligns text with hierarchical paths using structure-based similarity, and incorporate an auxiliary description alignment objective using natural-language path descriptions generated offline by a large language model, without introducing inference-time overhead.
Extensive experiments on three benchmark datasets show that our approach achieves competitive or state-of-the-art performance using a standard RoBERTa encoder and a unified classifier. Further ablation and depth-wise analyses show that hierarchy-aware and semantic supervision play complementary roles, significantly improving performance on fine-grained and rare labels.
Paper Type: Long
Research Area: Information Extraction and Retrieval
Research Area Keywords: Information Extraction, Information Retrieval and Text Mining
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 2755
Loading