Multi-modal voice activity detection by embedding image features into speech signal

Yohei Abe, Akinori Ito

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

Lip movement has a close relationship with speech because the lips move when we talk. The idea behind this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using information hiding technique. Using the proposed framework, we can provide advanced speech communication only using the speech signal that includes lip movement features, without increasing the bitrate of the signal. In this paper, we show the basic framework of the method and apply the proposal method to multi-modal voice activity detection (VAD). As a result of detection experiment using the support vector machine, we obtained better performance than the audio-only VAD in a noisy environment. In addition, we investigated how data embedding into speech signal affects sound quality and detection performance.

本文言語English
ホスト出版物のタイトルProceedings - 2013 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013
出版社IEEE Computer Society
ページ271-274
ページ数4
ISBN(印刷版)9780769551203
DOI
出版ステータスPublished - 2013 1 1
イベント9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013 - Beijing, China
継続期間: 2013 10 162013 10 18

Other

Other9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013
CountryChina
CityBeijing
Period13/10/1613/10/18

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems
  • Signal Processing

フィンガープリント 「Multi-modal voice activity detection by embedding image features into speech signal」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル