Abstract: Text segmentation is a natural language processing task with popular applications, such as
topic segmentation, element discourse extraction, and sentence tokenization. Much work
has been done to develop accurate segmentation similarity metrics, but even the most advanced metrics used today, B, andWindowDiff,
exhibit incorrect behavior due to their evaluation of boundaries in isolation. In this paper,
we present a new segment-alignment based approach to segmentation similarity scoring and
a new similarity metric A. We show that A
does not exhibit the erratic behavior of B and
WindowDiff, quantify the likelihood of B and
WindowDiff misbehaving through simulation,
and discuss the versatility of alignment-based
approaches for segmentation similarity scoring.
We make our implementation of A publicly
available in the hope that it will encourage the
community to explore more sophisticated approaches for text segmentation similarity scoring.
0 Replies
Loading