TY - GEN
T1 - Multi-modal voice activity detection by embedding image features into speech signal
AU - Abe, Yohei
AU - Ito, Akinori
PY - 2013
Y1 - 2013
N2 - Lip movement has a close relationship with speech because the lips move when we talk. The idea behind this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using information hiding technique. Using the proposed framework, we can provide advanced speech communication only using the speech signal that includes lip movement features, without increasing the bitrate of the signal. In this paper, we show the basic framework of the method and apply the proposal method to multi-modal voice activity detection (VAD). As a result of detection experiment using the support vector machine, we obtained better performance than the audio-only VAD in a noisy environment. In addition, we investigated how data embedding into speech signal affects sound quality and detection performance.
AB - Lip movement has a close relationship with speech because the lips move when we talk. The idea behind this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using information hiding technique. Using the proposed framework, we can provide advanced speech communication only using the speech signal that includes lip movement features, without increasing the bitrate of the signal. In this paper, we show the basic framework of the method and apply the proposal method to multi-modal voice activity detection (VAD). As a result of detection experiment using the support vector machine, we obtained better performance than the audio-only VAD in a noisy environment. In addition, we investigated how data embedding into speech signal affects sound quality and detection performance.
KW - audio-visual
KW - information hiding
KW - multi-modal
KW - voice activity detection (VAD)
UR - http://www.scopus.com/inward/record.url?scp=84904489372&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904489372&partnerID=8YFLogxK
U2 - 10.1109/IIH-MSP.2013.76
DO - 10.1109/IIH-MSP.2013.76
M3 - Conference contribution
AN - SCOPUS:84904489372
SN - 9780769551203
T3 - Proceedings - 2013 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013
SP - 271
EP - 274
BT - Proceedings - 2013 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013
PB - IEEE Computer Society
T2 - 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013
Y2 - 16 October 2013 through 18 October 2013
ER -