On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis

Ranniery Maia, Masami Akamine

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

This paper presents a study on the importance of short-term speech parameterizations for expressive statistical parametric synthesis. Assuming a source-filter model of speech production, the analysis is conducted over spectral parameters, here defined as features which represent a minimum-phase synthesis filter, and some excitation parameters, which are features used to construct a signal that is fed to the minimum-phase synthesis filter to generate speech. In the first part, different spectral and excitation parameters that are applicable to statistical parametric synthesis are tested to determine which ones are the most emotion dependent. The analysis is performed through two methods proposed to measure the relative emotion dependency of each feature: one based on K-means clustering, and another based on Gaussian mixture modeling for emotion identification. Two commonly used forms of parameters for the short-term speech spectral envelope, the Mel cepstrum and the Mel line spectrum pairs are utilized. As excitation parameters, the anti-causal cepstrum, the time-smoothed group delay, and band-aperiodicity coefficients are considered. According to the analysis, the line spectral pairs are the most emotion dependent parameters. Among the excitation features, the band-aperiodicity coefficients present the highest correlation with the speaker's emotion. The most emotion dependent parameters according to this analysis were selected to train an expressive statistical parametric synthesizer using a speaker and language factorization framework. Subjective test results indicate that the considered spectral parameters have a bigger impact on the synthesized speech emotion when compared with the excitation ones.

Original languageEnglish
Pages (from-to)1209-1232
Number of pages24
JournalComputer Speech and Language
Volume28
Issue number5
DOIs
Publication statusPublished - 2014 Jan 1
Externally publishedYes

Keywords

  • Expressive speech synthesis
  • Speech parameterization
  • Speech synthesis
  • Statistical parametric speech synthesis

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Human-Computer Interaction

Fingerprint Dive into the research topics of 'On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis'. Together they form a unique fingerprint.

Cite this