Flexible Text Segmentation with Structured Multilabel ClassificationDownload PDF

2005 (modified: 16 Jul 2019)HLT/EMNLP 2005Readers: Everyone
Abstract: Many language processing tasks can be reduced to breaking the text into segments with prescribed properties. Such tasks include sentence splitting, tokenization, named-entity extraction, and chunking. We present a new model of text segmentation based on ideas from multilabel classification. Using this model, we can naturally represent segmentation problems involving overlapping and non-contiguous segments. We evaluate the model on entity extraction and noun-phrase chunking and show that it is more accurate for overlapping and non-contiguous segments, but it still performs well on simpler data sets for which sequential tagging has been the best method.
0 Replies

Loading