Hmm-based style control for expressive speech synthesis with arbitrary speaker's voice using model adaptation

Takashi Nose, Makoto Tachibana, Takao Kobayashi

Research output: Contribution to journalArticle

39 Citations (Scopus)

Abstract

This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speaker's synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speaker's MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speaker's model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speaker's style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.

Original languageEnglish
Pages (from-to)489-497
Number of pages9
JournalIEICE Transactions on Information and Systems
VolumeE92-D
Issue number3
DOIs
Publication statusPublished - 2009
Externally publishedYes

Keywords

  • Average voice model
  • Expressive speech
  • HMM-based speech synthesis
  • Model adaptation
  • Multiple-regression HSMM (MRHSMM)
  • Style control

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Hmm-based style control for expressive speech synthesis with arbitrary speaker's voice using model adaptation'. Together they form a unique fingerprint.

  • Cite this