Automatic Identification of Chinese Paired Discourse Connectives

Published: 01 Jan 2023, Last Modified: 14 Jun 2024ICSC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper describes our approach to automatically identify paired Discourse Connectives (DCs) in Chinese texts. Discourse Connectives (DCs) are terms that connect two text spans and signal the discourse relations between them. Most DCs consist of a consecutive words (eg. as a result); however paired DCs are composed of non-consecutive words that together signal the discourse relation (eg. on one hand … on the other hand). Although paired DCs are not common in English, they are very frequent in Chinese. The contribution of this paper in two-fold: First, we propose a methodology for the automatic identification of Chinese paired DCs. Second, we present a new corpus based on the Chinese Discourse Treebank (CDTB) [1] annotated with paired DCs. To identify paired DCs, we experimented with two main approaches: hypothesis testing and supervised machine learning. Although the hypothesis testing approaches led to lower than expected results, the simple machine learning models achieved F-scores between 72.5%–75.6% with no fine-tuning.
Loading