Two-stage sequence-to-sequence neural voice conversion with low-to-high definition spectrogram mapping

Sou Miyamoto, Takashi Nose, Kazuyuki Hiroshiba, Yuri Odagiri, Akinori Ito

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

In this study, we propose a voice conversion technique with two-stage conversion, which is realized by using two models consisting of U-Net and pix2pix. Using U-Net, we tried to reproduce intonation of a target speaker by performing low-dimensional feature conversion considering the time direction. We introduced pix2pix for the task of spectrogram enhancement. The pix2pix is trained to map from low definition spectrogram to high definition spectrogram (low-to-high spectrogram mapping). Low definition spectrogram is reconstructed from low dimensional mel-cepstrum converted by U-Net and high definition spectrogram is extracted from natural speech. In objective evaluations, we showed that the proposed method was effective in improvement of mel-cepstral distance (MCD) and Log F0 RMSE. Subjective evaluations revealed that the use of the proposed method had a certain effect in improving speech individuality while maintaining the same level of naturalness as the conventional method.

本文言語English
ホスト出版物のタイトルRecent Advances in Intelligent Information Hiding and Multimedia Signal Processing - Proceeding of the Fourteenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing
編集者Lakhmi C. Jain, Lakhmi C. Jain, Pei-Wei Tsai, Akinori Ito, Jeng-Shyang Pan, Lakhmi C. Jain
出版社Springer Science and Business Media Deutschland GmbH
ページ132-139
ページ数8
ISBN(印刷版)9783030037475
DOI
出版ステータスPublished - 2019
イベント14th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2018 - Sendai, Japan
継続期間: 2018 11 262018 11 28

出版物シリーズ

名前Smart Innovation, Systems and Technologies
110
ISSN(印刷版)2190-3018
ISSN(電子版)2190-3026

Other

Other14th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2018
CountryJapan
CitySendai
Period18/11/2618/11/28

ASJC Scopus subject areas

  • Decision Sciences(all)
  • Computer Science(all)

フィンガープリント 「Two-stage sequence-to-sequence neural voice conversion with low-to-high definition spectrogram mapping」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル