Multilevel parametric-base F0 model for speech synthesis

Javier Latorre, Masami Akamine

Research output: Contribution to journalConference articlepeer-review

38 Citations (Scopus)

Abstract

This paper proposes a new F0 model for speech synthesis based on the parameterization of the logF0 contour of the syllables. This parameterization consists of the N-order discrete cosine transform (DCT) plus some additional parameters such as the gradient of the syllable average pitch. A statistical model of the syllable pitch contour is then created by clustering the parameterized vectors with a decision tree. Similar statistical models are also created for other linguistic levels other than the syllable. For synthesis, the statistical model of each level is used to define a log-likelihood function for the input text. These functions are then weighted and added into a global log-likelihood function which is then maximized with respect to the DCT coefficients of the syllable model. The final logF0 contour is obtained from the inverse transformation of the syllable DCT coefficients. A subjective test showed a clear preference for the proposed model against our previous HMM-based baseline.

Original languageEnglish
Pages (from-to)2274-2277
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2008 Dec 1
Externally publishedYes
EventINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia
Duration: 2008 Sep 222008 Sep 26

Keywords

  • Discrete cosine transform
  • HMM-based synthesis
  • Prosody
  • Speech synthesis

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Sensory Systems

Fingerprint Dive into the research topics of 'Multilevel parametric-base F0 model for speech synthesis'. Together they form a unique fingerprint.

Cite this