CycleGAN-Based High-Quality Non-Parallel Voice Conversion with Spectrogram and WaveRNN

Aoi Kanagaki, Masaya Tanaka, Takashi Nose, Ryohei Shimizu, Akira Ito, Akinori Ito

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes Scyclone, a high-quality voice conversion (VC) technique without parallel data training. Scyclone improves speech naturalness and speaker similarity of the converted speech by introducing CycleGAN-based spectrogram conversion with a simplified WaveRNN-based vocoder. In Scyclone, a linear spectrogram is used as the conversion feature, which avoids quality degradation due to extraction errors. The subjective experiments show that Scyclone is significantly better than CycleGAN-VC2, one of the existing state-of-the-art parallel-data-free VC techniques.

Original languageEnglish
Title of host publication2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages356-357
Number of pages2
ISBN (Electronic)9781728198026
DOIs
Publication statusPublished - 2020 Oct 13
Event9th IEEE Global Conference on Consumer Electronics, GCCE 2020 - Kobe, Japan
Duration: 2020 Oct 132020 Oct 16

Publication series

Name2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020

Conference

Conference9th IEEE Global Conference on Consumer Electronics, GCCE 2020
CountryJapan
CityKobe
Period20/10/1320/10/16

Keywords

  • CycleGAN
  • parallel-data-free VC
  • spectrogram
  • Voice conversion (VC)
  • WaveRNN

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Media Technology
  • Instrumentation
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'CycleGAN-Based High-Quality Non-Parallel Voice Conversion with Spectrogram and WaveRNN'. Together they form a unique fingerprint.

Cite this