This paper analyzes the problem of the spectral enhancement technique using global variance (GV) in HMM-based speech synthesis. In the conventional GV-based parameter generation, spectral enhancement with variance compensation is achieved by considering a GV pdf with fixed parameters for every output utterances through the generation process. Although the spectral peaks of the generated trajectory are clearly emphasized and subjective clarity is improved, the use of the fixed GV parameters results in a much smaller variation of GVs among the synthesized utterances than that of the natural speech, which sometimes causes undesirable effect. In this paper, we examine the above problem in terms of multiple objective measures such as variance characteristics, spectral and GV distortions, and GV correlations and discuss the result. We propose a simple alternative technique based on an affine transformation that provides a closer GV distribution to the original speech and improves the correlation of GVs of generated parameter sequences. The experimental results show that the proposed spectral enhancement outperforms the conventional GV-based parameter generation in the objective measures.
|ジャーナル||Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH|
|出版ステータス||Published - 2014 1月 1|
|イベント||15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore|
継続期間: 2014 9月 14 → 2014 9月 18
ASJC Scopus subject areas