Speaker-independent HMM-based voice conversion using quantized fundamental frequency

Takashi Nose, Takao Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper proposes a segment-based voice conversion technique between arbitrary speakers with a small amount of training data. In the proposed technique, an input speech utterance of source speaker is decoded into phonetic and prosodic symbol sequences, and then the converted speech is generated from the pre-trained target speaker's HMM using the decoded information. To reduce the required amount of training data, we use speaker-independent model in the decoding of the input speech, and model adaptation for the training of the target speaker's model. Experimental results show that there is no need to prepare the source speaker's training data, and the proposed technique with only ten sentences of the target speaker's adaptation data outperforms the conventional GMM-based one using parallel data of 200 sentences.

Original languageEnglish
Title of host publicationProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PublisherInternational Speech Communication Association
Pages1724-1727
Number of pages4
Publication statusPublished - 2010
Externally publishedYes

Publication series

NameProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

Keywords

  • Average voice model
  • HMM-based speech synthesis
  • Segment-based mapping
  • Speaker adaptation
  • Voice conversion

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Fingerprint

Dive into the research topics of 'Speaker-independent HMM-based voice conversion using quantized fundamental frequency'. Together they form a unique fingerprint.

Cite this