HMM-based speaker characteristics emphasis using average voice model

Takashi Nose, Junichi Adada, Takao Kobayashi

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

This paper presents a technique for controlling and emphasizing speaker characteristics of synthetic speech. The key idea comes from the way of imitating voice by professional impersonators. In the voice imitation, impersonators effectively utilize exaggeration of a target speaker's voice characteristics. To model and control the degree of speaker characteristics, we use a speech synthesis framework based on multiple-regression hidden semi-Markov model (MRHSMM). In MRHSMM, mean parameters are given by multiple regression of a low-dimensional control vector. The control vector represents how much the target speaker's model parameters are different from those of the average voice model. By changing the control vector in speech synthesis, we can control the degree of voice characteristics of the target speaker. Results of subjective experiments show that the speaker reproducibility of synthetic speech is improved by emphasizing speaker characteristics.

Original languageEnglish
Pages (from-to)2631-2634
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2009 Nov 26
Externally publishedYes
Event10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom
Duration: 2009 Sep 62009 Sep 10

Keywords

  • HMM-based speech synthesis
  • Multiple-regression HSMM
  • Speaker characteristics
  • Style control
  • Voice imitation

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Sensory Systems

Fingerprint Dive into the research topics of 'HMM-based speaker characteristics emphasis using average voice model'. Together they form a unique fingerprint.

Cite this