TY - GEN
T1 - Speech factorization for HMM-TTS based on cluster adaptive training.
AU - Latorre, Javier
AU - Wan, Vincent
AU - Gales, Mark J.F.
AU - Chen, Langzhou
AU - Chin, K. K.
AU - Knill, Kate
AU - Akamine, Masami
PY - 2012/12/1
Y1 - 2012/12/1
N2 - This paper presents a novel approach to factorize and control different speech factors in HMM-based TTS systems. In this paper cluster adaptive training (CAT) is used to factorize speaker identity and expressiveness (i.e. emotion). Within a CAT framework, each speech factor can be modelled by a different set of clusters. Users can control speaker identity and expressiveness independently by modifying the weights associated with each set. These weights are defined in a continuous space, so variations of speaker and emotion are also continuous. Additionally, given a speaker which has only neutral-style training data, the approach is able to synthesise speech with that speaker's voice and different expressions. Lastly, the paper discusses how generalization of the basic factorization concept could allow the production of expressive speech from neutral voices for other HMM-TTS systems not based on CAT.
AB - This paper presents a novel approach to factorize and control different speech factors in HMM-based TTS systems. In this paper cluster adaptive training (CAT) is used to factorize speaker identity and expressiveness (i.e. emotion). Within a CAT framework, each speech factor can be modelled by a different set of clusters. Users can control speaker identity and expressiveness independently by modifying the weights associated with each set. These weights are defined in a continuous space, so variations of speaker and emotion are also continuous. Additionally, given a speaker which has only neutral-style training data, the approach is able to synthesise speech with that speaker's voice and different expressions. Lastly, the paper discusses how generalization of the basic factorization concept could allow the production of expressive speech from neutral voices for other HMM-TTS systems not based on CAT.
KW - Cluster adaptive training
KW - Expressive synthesis
KW - Speech factorization
KW - Speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=84878418697&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84878418697&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84878418697
SN - 9781622767595
T3 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
SP - 970
EP - 973
BT - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
T2 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Y2 - 9 September 2012 through 13 September 2012
ER -