TY - JOUR
T1 - Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis
AU - Nagata, Tomohiro
AU - Mori, Hiroki
AU - Nose, Takashi
PY - 2013/1/1
Y1 - 2013/1/1
N2 - This paper describes spontaneous dialogue speech synthe- sis based on multiple-regression hidden semi-Markov model (MRHSMM), which enables users to specify paralinguistic in- formation of synthesized speech with a dimensional representa- Tion. Paralinguistic aspects of synthesized speech are controlled by multiple regression models whose explanatory variables are abstract dimensions such as pleasant-unpleasant and aroused- sleepy. For robust estimation of the regression matrices of the MRHSMM with unbalanced spontaneous dialogue speech sam- ples, the re-estimation formulae were derived in the framework of the maximum a posteriori (MAP) estimation. The result of a perceptual experiment confirmed that the naturalness of synthe- sized speech was improved by applying the MAP estimation for regression matrices. In addition a high correlation (R ≃ 0:7) wasobserved between given and perceived paralinguistic infor- mation, which implies that the proposed method could success- fully reflect intended paralinguistic messages on the synthesized speech.
AB - This paper describes spontaneous dialogue speech synthe- sis based on multiple-regression hidden semi-Markov model (MRHSMM), which enables users to specify paralinguistic in- formation of synthesized speech with a dimensional representa- Tion. Paralinguistic aspects of synthesized speech are controlled by multiple regression models whose explanatory variables are abstract dimensions such as pleasant-unpleasant and aroused- sleepy. For robust estimation of the regression matrices of the MRHSMM with unbalanced spontaneous dialogue speech sam- ples, the re-estimation formulae were derived in the framework of the maximum a posteriori (MAP) estimation. The result of a perceptual experiment confirmed that the naturalness of synthe- sized speech was improved by applying the MAP estimation for regression matrices. In addition a high correlation (R ≃ 0:7) wasobserved between given and perceived paralinguistic infor- mation, which implies that the proposed method could success- fully reflect intended paralinguistic messages on the synthesized speech.
KW - Hmm-based speech synthesis
KW - MAP estimation
KW - MRHSMM
KW - Paralinguistic information
KW - Spontaneous speech
KW - UU database
UR - http://www.scopus.com/inward/record.url?scp=84906230629&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906230629&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84906230629
SP - 1549
EP - 1553
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SN - 2308-457X
T2 - 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013
Y2 - 25 August 2013 through 29 August 2013
ER -