On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis

Ranniery Maia, Masami Akamine

研究成果: Article査読

4 被引用数 (Scopus)


This paper presents a study on the importance of short-term speech parameterizations for expressive statistical parametric synthesis. Assuming a source-filter model of speech production, the analysis is conducted over spectral parameters, here defined as features which represent a minimum-phase synthesis filter, and some excitation parameters, which are features used to construct a signal that is fed to the minimum-phase synthesis filter to generate speech. In the first part, different spectral and excitation parameters that are applicable to statistical parametric synthesis are tested to determine which ones are the most emotion dependent. The analysis is performed through two methods proposed to measure the relative emotion dependency of each feature: one based on K-means clustering, and another based on Gaussian mixture modeling for emotion identification. Two commonly used forms of parameters for the short-term speech spectral envelope, the Mel cepstrum and the Mel line spectrum pairs are utilized. As excitation parameters, the anti-causal cepstrum, the time-smoothed group delay, and band-aperiodicity coefficients are considered. According to the analysis, the line spectral pairs are the most emotion dependent parameters. Among the excitation features, the band-aperiodicity coefficients present the highest correlation with the speaker's emotion. The most emotion dependent parameters according to this analysis were selected to train an expressive statistical parametric synthesizer using a speaker and language factorization framework. Subjective test results indicate that the considered spectral parameters have a bigger impact on the synthesized speech emotion when compared with the excitation ones.

ジャーナルComputer Speech and Language
出版ステータスPublished - 2014 9

ASJC Scopus subject areas

  • ソフトウェア
  • 理論的コンピュータサイエンス
  • 人間とコンピュータの相互作用


「On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。