Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

Published: 01 Jan 2022, Last Modified: 18 Aug 2024SIGIR 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy. The majority of prior studies consider HTC as a flat multi-label classification problem, which inevitably leads to ''label inconsistency'' problem. In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure. Moreover, we design a constrained decoding strategy with dynamic vocabulary to secure the label consistency of the results. Compared with previous works, the proposed approach achieves significant and consistent improvements on three benchmark datasets.
Loading