Unit selection speech synthesis using multiple speech units at non-adjacent segments for prosody and waveform generation

Masatsune Tamura, Norbert Braunschweiler, Takehiko Kagoshima, Masami Akamine

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In this paper, we propose a speech synthesis method that combines a natural waveform concatenation based speech synthesis method and our baseline plural unit selection and fusion method. Two main features of the proposed method are (i) prosody regeneration from selected speech units and (ii) using multiple speech units at non-adjacent segments. The non-adjacent segments is the segment that the previous or following speech units in the optimum speech unit sequence are not adjacent in the database. By using the prosody of selected speech units, the original prosodic expressions and sounds of recorded speech are retained, while discontinuities are reduced by using multiple speech units at non-adjacent segments. MOS evaluations showed that the proposed method provides a clear improvement against the conventional unit selection method and our baseline method.

Original languageEnglish
Title of host publication2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
Pages4802-4805
Number of pages4
DOIs
Publication statusPublished - 2010 Nov 8
Externally publishedYes
Event2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Dallas, TX, United States
Duration: 2010 Mar 142010 Mar 19

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
CountryUnited States
CityDallas, TX
Period10/3/1410/3/19

Keywords

  • Concatenative speech synthesis
  • Prosody generation
  • Unit fusion
  • Unit selection

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Unit selection speech synthesis using multiple speech units at non-adjacent segments for prosody and waveform generation'. Together they form a unique fingerprint.

Cite this