Complex cepstrum for statistical parametric speech synthesis

Ranniery Maia, Masami Akamine, Mark J.F. Gales

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Statistical parametric synthesizers have typically relied on a simplified model of speech production. In this model, speech is generated using a minimum-phase filter, implemented from coefficients derived from spectral parameters, driven by a zero or random phase excitation signal. This excitation signal is usually constructed from fundamental frequencies and parameters used to control the balance between the periodicity and aperiodicity of the signal. The application of this approach to statistical parametric synthesis has partly been motivated by speech coding theory. However, in contrast to most real-time speech coders, parametric speech synthesizers do not require causality. This allows the standard simplified model to be extended to represent the natural mixed-phase characteristics of speech signals. This paper proposes the use of the complex cepstrum to model the mixed phase characteristics of speech through the incorporation of phase information in statistical parametric synthesis. The phase information is contained in the anti-causal portion of the complex cepstrum. These parameters have a direct connection with the shape of the glottal pulse of the excitation signal. Phase parameters are extracted on a frame-basis and are modeled in the same fashion as the minimum-phase synthesis filter parameters. At synthesis time, phase parameter trajectories are generated and used to modify the excitation signal. Experimental results show that the use of such complex cepstrum-based phase features results in better synthesized speech quality. Listening test results yield an average preference of 60% for the system with the proposed phase feature on both female and male voices.

Original languageEnglish
Pages (from-to)606-618
Number of pages13
JournalSpeech Communication
Volume55
Issue number5
DOIs
Publication statusPublished - 2013 Jun 1
Externally publishedYes

Keywords

  • Cepstral analysis
  • Complex cepstrum
  • Glottal source models
  • Spectral analysis
  • Speech synthesis
  • Statistical parametric speech synthesis

ASJC Scopus subject areas

  • Software
  • Modelling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Complex cepstrum for statistical parametric speech synthesis'. Together they form a unique fingerprint.

  • Cite this