Sub-band basis spectrum model for pitch-synchronous log-spectrum and phase based on approximation of sparse coding

Masatsune Tamura, Takehiko Kagoshima, Masami Akamine

研究成果: Conference contribution

3 被引用数 (Scopus)

抄録

In this paper, we propose a sub-band basis spectrum model which is a new spectrum representation model based on a linear combination of sub-band basis vectors. We apply sparse coding to the pitch-synchronously analyzed log-spectra. Based on the approximation of the resulting basis, we obtain subband basis vectors with 1-cycle sinusoidal shapes that have mel-scale for lower frequencies and equally spaced scale for higher frequencies. Parameters of the sub-band basis spectrum model representing the log spectrum and the phase spectrum are calculated by fitting the basis to the spectrum. Since the parameters represent the shape of a spectrum, it can be easily used for voice adaptation, interpolation and conversion. Experimental results show that the analysis synthesis speech based on the proposed model is close to original speech and that there is no significant difference between the synthetic speech using analysis-synthesis database and those using original database for unit-fusion based TTS[1].

本文言語English
ホスト出版物のタイトルProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
出版社International Speech Communication Association
ページ2406-2409
ページ数4
出版ステータスPublished - 2010
外部発表はい

出版物シリーズ

名前Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

ASJC Scopus subject areas

  • 言語および言語学
  • 言語聴覚療法
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Sub-band basis spectrum model for pitch-synchronous log-spectrum and phase based on approximation of sparse coding」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル