Evaluation of prosodic contextual factors for HMM-based speech synthesis

Shuji Yokomizo, Takashi Nose, Takao Kobayashi

Research output: Contribution to conferencePaper

8 Citations (Scopus)

Abstract

This paper explores the effect of prosodic contextual factors for speech synthesis based on hidden Markov model (HMM). In the HMM-based speech synthesis, to model not only the phonetic features but also the prosodic ones, a variety of contextual factors are taken into account in the model training. In a baseline system, a lot of contextual factors are used, and the resultant cost for parameter tying by context clustering becomes relatively high compared to that in the speech recognition. We examine the choice of prosodic contexts by objective measures for English and Japanese speech data which have difference linguistic and prosodic characteristics. Experimental results show that more compact context sets give also comparable or close performance to the conventional full context.

Original languageEnglish
Pages430-433
Number of pages4
Publication statusPublished - 2010 Dec 1
Externally publishedYes
Event11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan
Duration: 2010 Sep 262010 Sep 30

Conference

Conference11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
CountryJapan
CityMakuhari, Chiba
Period10/9/2610/9/30

Keywords

  • Computation time
  • Context clustering
  • Contextual factor
  • HMM-based speech synthesis
  • Prosody modeling

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Fingerprint Dive into the research topics of 'Evaluation of prosodic contextual factors for HMM-based speech synthesis'. Together they form a unique fingerprint.

  • Cite this

    Yokomizo, S., Nose, T., & Kobayashi, T. (2010). Evaluation of prosodic contextual factors for HMM-based speech synthesis. 430-433. Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.