Non-native speech conversion with consistency-aware recursive network and generative adversarial network

Keisuke Oyamada, Hirokazu Kameoka, Takuhiro Kaneko, Hiroyasu Ando, Kaoru Hiramatsu, Kunio Kashino

研究成果: Conference contribution

11 被引用数 (Scopus)

抄録

This paper deals with the problem of automatically correcting the pronunciation of non-native speakers. Since the pronunciation characteristics of non-native speakers depend heavily on the context (such as words), conversion rules for correcting pronunciation should be learned from a sequence of features rather than a single-frame feature. For the online conversion of local sequences of features, we construct a neural network (NN) that takes a sequence of features as an input/output, generates a sequence of features in a segment-by- segment fashion and guarantees the consistency of the generated features within overlapped segments. Futhermore, we apply a recently proposed generative adversarial network (GAN)-based postfilter to the generated feature sequence with the aim of synthesizing natural-sounding speech. Through subjective and quantitative evaluations, we confirmed the superiority of our proposed method over a conventional NN approach in terms of conversion quality.

本文言語English
ホスト出版物のタイトルProceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
出版社Institute of Electrical and Electronics Engineers Inc.
ページ182-188
ページ数7
ISBN(電子版)9781538615423
DOI
出版ステータスPublished - 2018 2 5
外部発表はい
イベント9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 - Kuala Lumpur, Malaysia
継続期間: 2017 12 122017 12 15

出版物シリーズ

名前Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
2018-February

Other

Other9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
国/地域Malaysia
CityKuala Lumpur
Period17/12/1217/12/15

ASJC Scopus subject areas

  • 人工知能
  • 人間とコンピュータの相互作用
  • 情報システム
  • 信号処理

フィンガープリント

「Non-native speech conversion with consistency-aware recursive network and generative adversarial network」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル