A style control technique for speech synthesis using multiple regression HSMM

Takashi Nose, Junichi Yamagishi, Takao Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

This paper presents a technique for controlling intuitively the degree or intensity of speaking styles and emotional expressions of synthetic speech. The conventional style control technique based on multiple regression HMM (MRHMM) has a problem that it is difficult to control phone duration of synthetic speech because HMM has no explicit parameter which models phone duration appropriately. To overcome this problem, we use multiple regression hidden semi-Markov model (MRHSMM) which has explicit state duration distributions to control phone duration. We show that the duration control is important for style control of synthetic speech from the results of subjective tests. We also compare the proposed technique with another control technique based on model interpolation.

Original languageEnglish
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PublisherInternational Speech Communication Association
Pages1324-1327
Number of pages4
Volume3
ISBN (Print)9781604234497
Publication statusPublished - 2006 Jan 1
Externally publishedYes
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
Duration: 2006 Sep 172006 Sep 21

Other

OtherINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Country/TerritoryUnited States
CityPittsburgh, PA
Period06/9/1706/9/21

Keywords

  • Emotional expression
  • Hidden semi-Markov model
  • HMM-based speech synthesis
  • Multiple regression HMM
  • Speaking style

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'A style control technique for speech synthesis using multiple regression HSMM'. Together they form a unique fingerprint.

Cite this