PaCoClass: Leveraging Path Selection and Conditional Classifier for Hierarchical Text Classification with Minimal Supervision
Keywords: Hierarchical Text Classification, weakly supervised, zero-shot, Large Language Models
Abstract: Hierarchical Text Classification (HTC) aims to categorize documents into specific paths within a label taxonomy. Most existing works for this task are fully supervised methods, which demand substantial time and effort from domain experts for data annotation. On the other hand, only a handful of studies have focused on HTC with severely constrained supervision signals. They usually adopt a top-down strategy to identify document-relevant classes, yet risk missing some truly essential candidates. And they often adopt a flat classifier architectures, which probably produce predictions violating hierarchical constraint. To address these limitations, we propose a novel weakly-supervised HTC framework called PaCoClass. Specifically, PaCoClass first introduces a Bidirectional Path Consistency Scoring mechanism to quantify document-label path semantic alignment by combining bottom-up candidates retrieval with top-down consistency constraints. Subsequently, PaCoClass designs an LLM-Enhanced Path Refinement strategy, which introduces Large Language Models (LLMs) to further refine high scoring paths. Thirdly, a conditional classifier architecture is introduced instead of flat classifiers, which inherently enforces hierarchical constraints and captures intrinsic label dependencies. Finally, experiments demonstrate that our framework consistently outperforms several well-known competing HTC methods at all stages.
Paper Type: Long
Research Area: Hierarchical Structure Prediction, Syntax, and Parsing
Research Area Keywords: Efficient/Low-Resource Methods for NLP
Contribution Types: Approaches to low-resource settings
Languages Studied: English
Submission Number: 3681
Loading