Discourse Sense Flows: Modelling the Rhetorical Style of Documents across Various Domains

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: Discourse and Pragmatics
Keywords: rhetorical style, cross-domain, discourse parsing, discourse signals, connecting phrases, sense recognition
TL;DR: Our work extracts discourse relations and linearizes their senses for comparing the rhetorical style of documents across various corpora.
Abstract: Recent research on shallow discourse parsing has given renewed attention to the role of discourse relation signals, in particular explicit connectives and so-called alternative lexicalizations. In our work, we first develop new models for extracting signals and classifying their senses, both for explicit connectives and alternative lexicalizations, based on the Penn Discourse Treebank v3 corpus. Thereafter, we apply these models to various raw corpora, and we introduce 'discourse sense flows', a new way of modeling the rhetorical style of a document by the linear order of coherence relations, as captured by the PDTB senses. The corpora span several genres and domains, and we undertake comparative analyses of the sense flows, as well as experiments on automatic genre/domain discrimination using discourse sense flow patterns as features. We find that n-gram patterns are indeed stronger predictors than simple sense (unigram) distributions.
Submission Number: 3138
Loading