A Unified Document-Level Chinese Discourse Parser on Different Granularity Levels

Published: 2023, Last Modified: 17 Jul 2025ICDAR (1) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Discourse parsing aims to comprehend the structure and semantics of a document. Some previous studies have taken multiple levels of granularity methods to parse documents while disregarding the connection between granularity levels. Additionally, almost all the Chinese discourse parsing approaches concentrated on a single granularity due to lacking annotated corpora. To address the above issues, we propose a unified document-level Chinese discourse parser based on multi-granularity levels, which leverages granularity connections between paragraphs and Elementary Discourse Units (EDUs) in a document. Specifically, we first identify EDU-level discourse trees and then introduce a structural encoding module to capture EDU-level structural and semantic information. It can significantly promote the construction of paragraph-level discourse trees. Moreover, we construct the Unified Chinese Discourse TreeBank (UCDTB), which includes 467 articles with annotations from clauses to the whole article, filling the gap in existing unified corpus resources on Chinese discourse parsing. The experiments on both Chinese UCDTB and English RST-DT show that our model outperforms the SOTA baselines.
Loading