TY - GEN
T1 - Speaker-independent HMM-based voice conversion using quantized fundamental frequency
AU - Nose, Takashi
AU - Kobayashi, Takao
N1 - Funding Information:
voices. 6. Acknowledgements A part of this work was supported by JSPS Grant-in-Aid for Scientific Research 21300063 and 21800020.
PY - 2010
Y1 - 2010
N2 - This paper proposes a segment-based voice conversion technique between arbitrary speakers with a small amount of training data. In the proposed technique, an input speech utterance of source speaker is decoded into phonetic and prosodic symbol sequences, and then the converted speech is generated from the pre-trained target speaker's HMM using the decoded information. To reduce the required amount of training data, we use speaker-independent model in the decoding of the input speech, and model adaptation for the training of the target speaker's model. Experimental results show that there is no need to prepare the source speaker's training data, and the proposed technique with only ten sentences of the target speaker's adaptation data outperforms the conventional GMM-based one using parallel data of 200 sentences.
AB - This paper proposes a segment-based voice conversion technique between arbitrary speakers with a small amount of training data. In the proposed technique, an input speech utterance of source speaker is decoded into phonetic and prosodic symbol sequences, and then the converted speech is generated from the pre-trained target speaker's HMM using the decoded information. To reduce the required amount of training data, we use speaker-independent model in the decoding of the input speech, and model adaptation for the training of the target speaker's model. Experimental results show that there is no need to prepare the source speaker's training data, and the proposed technique with only ten sentences of the target speaker's adaptation data outperforms the conventional GMM-based one using parallel data of 200 sentences.
KW - Average voice model
KW - HMM-based speech synthesis
KW - Segment-based mapping
KW - Speaker adaptation
KW - Voice conversion
UR - http://www.scopus.com/inward/record.url?scp=79959830930&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959830930&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:79959830930
T3 - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
SP - 1724
EP - 1727
BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PB - International Speech Communication Association
ER -