TY - JOUR
T1 - Effect of speed difference between time-expanded speech and moving image of talker's face on word intelligibility
AU - Sakamoto, Shuichi
AU - Tanaka, Akihiro
AU - Tsumura, Komi
AU - Suzuki, Yôiti
N1 - Funding Information:
Acknowledgements This work was supported by a Grant-in-Aid for Specially Promoted Research No. 19001004 from MEXT Japan. The authors would like to thank Dr. Hideki Kawahara for permission to use the STRAIGHT vocoding method. The authors would also like to thank the members of the NHK Science and Technical Research Laboratories for their helpful comments on our research.
PY - 2008/12
Y1 - 2008/12
N2 - This study investigated effects of asynchrony between speech signal and moving image of talker's face induced by time-expansion of the speech signal on speech intelligibility. Word intelligibility test was performed to younger listeners. Japanese 4-mora words were uttered by a female speaker. Each word was processed with STRAIGHT software to expand the speech signal by from 0 to 400 ms. These signals were combined with moving image of talker's face which was kept at original speed. This test was performed under three conditions: visual-only, auditory-only, and auditory-visual (AV) condition. Results showed that intelligibility scores under AV condition were statistically higher than those under auditory-only condition even when the speech signal was expanded by 400 ms. These results suggest that moving image of talker's face is effective to enhance speech intelligibility if the lag between the speech signal and moving image of talker's face does not exceed 400 ms.
AB - This study investigated effects of asynchrony between speech signal and moving image of talker's face induced by time-expansion of the speech signal on speech intelligibility. Word intelligibility test was performed to younger listeners. Japanese 4-mora words were uttered by a female speaker. Each word was processed with STRAIGHT software to expand the speech signal by from 0 to 400 ms. These signals were combined with moving image of talker's face which was kept at original speed. This test was performed under three conditions: visual-only, auditory-only, and auditory-visual (AV) condition. Results showed that intelligibility scores under AV condition were statistically higher than those under auditory-only condition even when the speech signal was expanded by 400 ms. These results suggest that moving image of talker's face is effective to enhance speech intelligibility if the lag between the speech signal and moving image of talker's face does not exceed 400 ms.
KW - Audio-visual interaction
KW - Lip-reading
KW - Moving image of talker's face
KW - Time-expanded speech
KW - Word intelligibillity
UR - http://www.scopus.com/inward/record.url?scp=81255161588&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=81255161588&partnerID=8YFLogxK
U2 - 10.1007/s12193-009-0018-4
DO - 10.1007/s12193-009-0018-4
M3 - Article
AN - SCOPUS:81255161588
SN - 1783-7677
VL - 2
SP - 199
EP - 203
JO - Journal on Multimodal User Interfaces
JF - Journal on Multimodal User Interfaces
IS - 3
ER -