TY - GEN
T1 - Very low bit-rate F0 coding for phonetic vocoder using MSD-HMM with quantized F0 context
AU - Nose, Takashi
AU - Kobayashi, Takao
PY - 2011
Y1 - 2011
N2 - This paper presents a very low bit-rate F0 coding technique for speaker-dependent phonetic vocoder based on hidden Markov model (HMM) using quantized F0 context. In the proposed technique, the input F0 sequence is converted into F0 symbol sequence at a phoneme level using scalar quantization. The quantized F0 symbols are used in the decoding process as the prosodic context for the HMM-based speech synthesis. The synthetic speech is generated from the context-dependent labels and input speaker's pre-trained HMMs by using the HMM-based parameter generation algorithm. By taking account account of preceding and succeeding phonemes and F0 symbols as the contextual factors, we can generate smooth F0 trajectory similar to that of the original with only a small number of quantization bits. Experimental results demonstrate that the proposed technique can generate F0 contour with acceptable quality even when the bit-rate is less than 50 bps.
AB - This paper presents a very low bit-rate F0 coding technique for speaker-dependent phonetic vocoder based on hidden Markov model (HMM) using quantized F0 context. In the proposed technique, the input F0 sequence is converted into F0 symbol sequence at a phoneme level using scalar quantization. The quantized F0 symbols are used in the decoding process as the prosodic context for the HMM-based speech synthesis. The synthetic speech is generated from the context-dependent labels and input speaker's pre-trained HMMs by using the HMM-based parameter generation algorithm. By taking account account of preceding and succeeding phonemes and F0 symbols as the contextual factors, we can generate smooth F0 trajectory similar to that of the original with only a small number of quantization bits. Experimental results demonstrate that the proposed technique can generate F0 contour with acceptable quality even when the bit-rate is less than 50 bps.
KW - F0 context
KW - HMM-based speech synthesis
KW - multi-space distribution HMM
KW - phonetic vocoder
KW - very low bit-rate speech coding
UR - http://www.scopus.com/inward/record.url?scp=80051614021&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80051614021&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2011.5947538
DO - 10.1109/ICASSP.2011.5947538
M3 - Conference contribution
AN - SCOPUS:80051614021
SN - 9781457705397
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5236
EP - 5239
BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Y2 - 22 May 2011 through 27 May 2011
ER -