A Multi-Task Dataset for Assessing Discourse Coherence in Chinese Essays: Structure, Theme, and Logic Analysis

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: Discourse and Pragmatics
Keywords: discourse coherence assessment
TL;DR: We present a high quality dataset for the analysis of discourse coherence.
Abstract: This paper introduces the \textbf{C}hinese \textbf{E}ssay \textbf{D}iscourse \textbf{C}oherence \textbf{C}orpus (\textbf{CEDCC}), a multi-task dataset for assessing discourse coherence. Existing research tends to focus on isolated dimensions of discourse coherence, a gap which the CEDCC addresses by integrating coherence grading, topical continuity, and discourse relations. This approach, alongside detailed annotations, captures the subtleties of real-world texts and stimulates progress in Chinese discourse coherence analysis. Our contributions include the development of the CEDCC, the establishment of baselines for further research, and the demonstration of the impact of coherence on discourse relation recognition and automated essay scoring. The dataset and related codes is available at \url{https://github.com/cubenlp/CEDCC_corpus}.
Submission Number: 1430
Loading