TY - JOUR
T1 - A maximum likelihood approach to the detection of moments of maximum excitation and its application to high-quality speech parameterization
AU - Maia, Ranniery
AU - Stylianou, Yannis
AU - Akamine, Masami
PY - 2015/1/1
Y1 - 2015/1/1
N2 - This paper presents an algorithm to detect moments of maximum excitation (MME) in speech. It assumes a model in which speech can be represented as a sequence of pulses located at the MME convolved with a time-varying minimum-phase impulse response. By considering that in the glottal cycle speech concentrates more energy at the MME than at other instants, the locations and amplitudes of the excitation pulses are determined through maximum likelihood estimation. The suggested approach provides a fully automatic and consistent method for the detection of MME in speech without relying on ad hoc procedures which usually do not work well across different speech styles without a required amount of adjustments. Experiments with speech parameterization, in the context of complex cepstrum analysis and synthesis, have shown that the proposed MME-based processing can improve signal to error reconstruction ratio up to 10%, when compared to the use of glottal closure instant estimations provided by a well-known algorithm.
AB - This paper presents an algorithm to detect moments of maximum excitation (MME) in speech. It assumes a model in which speech can be represented as a sequence of pulses located at the MME convolved with a time-varying minimum-phase impulse response. By considering that in the glottal cycle speech concentrates more energy at the MME than at other instants, the locations and amplitudes of the excitation pulses are determined through maximum likelihood estimation. The suggested approach provides a fully automatic and consistent method for the detection of MME in speech without relying on ad hoc procedures which usually do not work well across different speech styles without a required amount of adjustments. Experiments with speech parameterization, in the context of complex cepstrum analysis and synthesis, have shown that the proposed MME-based processing can improve signal to error reconstruction ratio up to 10%, when compared to the use of glottal closure instant estimations provided by a well-known algorithm.
KW - Epoch detection
KW - Pitch marking
KW - Speech analysis
KW - Speech modeling
KW - Speech parameterization
UR - http://www.scopus.com/inward/record.url?scp=84959133965&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959133965&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84959133965
SN - 2308-457X
VL - 2015-January
SP - 603
EP - 607
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
Y2 - 6 September 2015 through 10 September 2015
ER -