Abstract
This paper addresses an issue of prosodic variability of spontaneous speech in HMM-based spontaneous conversational speech synthesis. We propose an extended context set including information peculiar to spontaneous speech derived from the annotation data embedded in a large-scale database of spontaneous Japanese. We show the effectiveness of the newly introduced contexts from the results of objective and subjective evaluation experiments. We also propose stopping criteria for decision-tree clustering to alleviate an over-fitting problem. Experimental results show that the restriction of the size of each leaf node can improve the quality of synthetic speech.
Original language | English |
---|---|
Pages (from-to) | 2657-2660 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 2011 Dec 1 |
Externally published | Yes |
Event | 12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy Duration: 2011 Aug 27 → 2011 Aug 31 |
Keywords
- CSJ
- Conversational speech
- HMM-based speech synthesis
- Prosodic context
- Spontaneous speech
- X-JToBI
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation