TY - GEN
T1 - Analysis on the importance of short-term speech parameterizations for emotional statistical parametric speech synthesis
AU - Maia, Ranniery
AU - Akamine, Masami
PY - 2012
Y1 - 2012
N2 - This paper presents a study on the importance of shortterm spectral and excitation parameterizations for emotional hidden Markov model (HMM)-based speech synthesis. The analysis is performed through an emotion classification task by using two methods: K-means emotion clustering and Gaussian Mixture Models (GMMs)-based emotion identification. Two known forms of parameterization for the short-term speech spectral envelope, the mel-cepstrum and the mel-line spectrum pairs are utilized while features derived from the complex cepstrum and group delay, and band-aperiodicity coefficients are used as excitation parameters. The emotion-dependent features according to the classification performance are then selected to train emotion-dependent HMM-based synthesizers. Listening tests are performed to verify the impact of the parameters on the similarity of the synthesized speech with its natural version.
AB - This paper presents a study on the importance of shortterm spectral and excitation parameterizations for emotional hidden Markov model (HMM)-based speech synthesis. The analysis is performed through an emotion classification task by using two methods: K-means emotion clustering and Gaussian Mixture Models (GMMs)-based emotion identification. Two known forms of parameterization for the short-term speech spectral envelope, the mel-cepstrum and the mel-line spectrum pairs are utilized while features derived from the complex cepstrum and group delay, and band-aperiodicity coefficients are used as excitation parameters. The emotion-dependent features according to the classification performance are then selected to train emotion-dependent HMM-based synthesizers. Listening tests are performed to verify the impact of the parameters on the similarity of the synthesized speech with its natural version.
KW - Expressive speech synthesis
KW - Speech synthesis
KW - Statistical parametric speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=84878387086&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84878387086&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84878387086
SN - 9781622767595
T3 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
SP - 1630
EP - 1633
BT - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
T2 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Y2 - 9 September 2012 through 13 September 2012
ER -