A multiple-model based framework for automatic speech segmentation

Published: 2007, Last Modified: 09 Jan 2026INTERSPEECH 2007EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose a new approach to automatic speech segmentation for corpus-based speech synthesis. We utilizes multiple independent automatic segmentation machines (ASMs), instead of using a single ASM, to get final segmentation results: Given multiple independent time-marks from various ASMs, we remove biases of the time-marks, and then compute the weighted sum of the bias-removed time-marks. The bias and weight parameters needed for the proposed method are estimated for each phonetic context through a training procedure where manually-segmented results are used as the references. The bias parameters are obtained by averaging the corresponding errors. The weight parameters are simultaneously optimized through the gradient projection method to overcome a set of constraints in the weight parameter space. A decision tree is employed to deal with the unseen phonetic contexts. Experimental results show that the proposed method remarkably improves the segmentation accuracy.
Loading