Conversational spontaneous speech synthesis using average voice model

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

Research output: Contribution to conferencePaper

5 Citations (Scopus)

Abstract

This paper describes conversational spontaneous speech synthesis based on hidden Markov model (HMM). To reduce the amount of data required for model training, we utilize an average-voice-based speech synthesis framework, which has been shown to be effective for synthesizing speech with arbitrary speaker's voice using a small amount of training data. We examine several kinds of average voice model using reading-style speech and/or conversation-style speech. We also examine an appropriate utterance unit for conversational speech synthesis. Experimental results show that the proposed two-stage model adaptation method improves the quality of synthetic conversational speech.

Original languageEnglish
Pages853-856
Number of pages4
Publication statusPublished - 2010 Dec 1
Externally publishedYes
Event11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan
Duration: 2010 Sep 262010 Sep 30

Conference

Conference11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
CountryJapan
CityMakuhari, Chiba
Period10/9/2610/9/30

Keywords

  • Average voice model
  • Conversational speech
  • HMM-based speech synthesis
  • Speaker adaptation
  • Spontaneous speech
  • Style adaptation

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Fingerprint Dive into the research topics of 'Conversational spontaneous speech synthesis using average voice model'. Together they form a unique fingerprint.

  • Cite this

    Koriyama, T., Nose, T., & Kobayashi, T. (2010). Conversational spontaneous speech synthesis using average voice model. 853-856. Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.