Emotional transplant in statistical speech synthesis based on emotion additive model

Yamato Ohtani, Yu Nasu, Masahiro Morita, Masami Akamine

Research output: Contribution to journalConference articlepeer-review

8 Citations (Scopus)

Abstract

This paper proposes a novel method to transplant emotions to a new speaker in statistical speech synthesis based on an emotion additive model (EAM), which represents the differences between emotional and neutral voices. This method trains EAM using neutral and emotional speech data of multiple speakers and applies it to a neutral voice model of a new speaker (target). There is some degradation in speech quality due to a mismatch in speakers between the EAM and the target neutral voice model. To alleviate the mismatch, we introduce an eigenvoice technique to this framework. We build neutral voice models and EAMs using multiple speakers, and construct an eigenvoice space consisting the neutral voice models and EAMs. To transplant the emotion to the target speaker, the proposed method estimates weights of eigenvoices for the target neutral speech data based on a maximum likelihood criteria. The EAM of the target speaker is obtained by applying the estimated weights to the EAM parameters of the eigenvoice space. Emotional speech is generated using the EAM and the neutral voice model. Experimental results show that the proposed method performs emotional speech synthesis with reasonable emotions and high speech quality.

Original languageEnglish
Pages (from-to)274-278
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2015-January
Publication statusPublished - 2015 Jan 1
Externally publishedYes
Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
Duration: 2015 Sep 62015 Sep 10

Keywords

  • Eigenvoice
  • Emotional speech synthesis
  • Emotional transplant
  • Hidden Markov model
  • Speech synthesis

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint Dive into the research topics of 'Emotional transplant in statistical speech synthesis based on emotion additive model'. Together they form a unique fingerprint.

Cite this