A Reference-free Segmentation Quality Index (Segmentation ReFree)Download PDF

Anonymous

16 Dec 2022 (modified: 05 May 2023)ACL ARR 2022 December Blind SubmissionReaders: Everyone
Abstract: Topic segmentation, in the context of natural language processing, is the process of finding boundaries in a sequence of sentences that separate groups of adjacent sentences at shifts in semantic meaning. Currently, assessing the quality of a segmentation is done by comparing segmentation boundaries selected to those selected by a known good reference. This means that it is not possible to quantify the quality of a segmentation without a human annotator, which can be costly and time consuming. This work seeks to improve assessment of segmentation by proposing a segmentation metric that requires no reference. The metric takes advantage of the fact that segmentation at a sentence level generally seeks to identify segment boundaries at semantic boundaries within the text. The proposed metric uses a modified cluster validity metric with semantic embeddings of the sentences to determine the quality of the segmentation. The metric is compared against existing reference-based segmentation metrics to demonstrate the strong correlation with them and show the proposed metric's relative accuracy. A Python library implementing the metric is released under the MIT license and the repository is available at \emph{url to be added}.
Paper Type: long
Research Area: Semantics: Sentence-level Semantics, Textual Inference and Other areas
0 Replies

Loading