Comparing Methods for Segmenting Elementary Discourse Units in a French Conversational Corpus

Laurent Prévot, Julie Hunter, Philippe Muller

Published: 2023, Last Modified: 07 Jan 2026NoDaLiDa 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: While discourse parsing has made considerable progress in recent years, discourse segmentation of conversational speech remains a difficult issue. In this paper, we exploit a French data set that has been manually segmented into discourse units to compare two approaches to discourse segmentation: fine-tuning existing systems on manual segmentation vs. using hand-crafted labelling rules to develop a weakly supervised segmenter. Our results show that both approaches yield similar performance in terms of f-score while data programming requires less manual annotation work. In a second experiment we play with the amount of training data used for fine-tuning systems and show that a small amount of hand labelled data is enough to obtain good results (although significantly lower than in the first experiment using all the annotated data available).

External IDs:dblp:conf/nodalida/PrevotHM23