Abstract
This paper presents an algorithm to detect moments of maximum excitation (MME) in speech. It assumes a model in which speech can be represented as a sequence of pulses located at the MME convolved with a time-varying minimum-phase impulse response. By considering that in the glottal cycle speech concentrates more energy at the MME than at other instants, the locations and amplitudes of the excitation pulses are determined through maximum likelihood estimation. The suggested approach provides a fully automatic and consistent method for the detection of MME in speech without relying on ad hoc procedures which usually do not work well across different speech styles without a required amount of adjustments. Experiments with speech parameterization, in the context of complex cepstrum analysis and synthesis, have shown that the proposed MME-based processing can improve signal to error reconstruction ratio up to 10%, when compared to the use of glottal closure instant estimations provided by a well-known algorithm.
Original language | English |
---|---|
Pages (from-to) | 603-607 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2015-January |
Publication status | Published - 2015 Jan 1 |
Externally published | Yes |
Event | 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany Duration: 2015 Sep 6 → 2015 Sep 10 |
Keywords
- Epoch detection
- Pitch marking
- Speech analysis
- Speech modeling
- Speech parameterization
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation