Hierarchical Text Segmentation from Multi-Scale Lexical CohesionDownload PDFOpen Website

2009 (modified: 12 Nov 2022)HLT-NAACL 2009Readers: Everyone
Abstract: This paper presents a novel unsupervised method for hierarchical topic segmentation. Lexical cohesion -- the workhorse of unsupervised linear segmentation -- is treated as a multi-scale phenomenon, and formalized in a Bayesian setting. Each word token is modeled as a draw from a pyramid of latent topic models, where the structure of the pyramid is constrained to induce a hierarchical segmentation. Inference takes the form of a coordinate-ascent algorithm, iterating between two steps: a novel dynamic program for obtaining the globally-optimal hierarchical segmentation, and collapsed variational Bayesian inference over the hidden variables. The resulting system is fast and accurate, and compares well against heuristic alternatives.
0 Replies

Loading