Style estimation of speech based on multiple regression hidden semi-Markov model

Takashi Nose, Yoichi Kato, Takao Kobayashi

Published: 2007, Last Modified: 19 May 2025INTERSPEECH 2007EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents a technique for estimating the degree or intensity of emotional expressions and speaking styles appeared in speech. The key idea is based on a style control technique for speech synthesis using multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse process of the style control. We derive an algorithm for estimating predictor variables of MRHSMM each of which represents a sort of emotion intensity or speaking style variability appeared in acoustic features based on an ML criterion. We also show preliminary experimental results to demonstrate an ability of the proposed technique for synthetic and acted speech samples with emotional expressions and speaking styles.