TY - GEN
T1 - Smile and laughter recognition using speech processing and face recognition from conversation video
AU - Ito, Akinori
AU - XinyueWang,
AU - Suzuki, Motoyuki
AU - Makino, Shozo
PY - 2005/12/1
Y1 - 2005/12/1
N2 - This paper describes a method to detect smiles and laughter sounds from the video of natural dialogue. A smile is the most common facial expression observed in a dialogue. Detecting a user's smiles and laughter sounds can be useful for estimating the mental state of the user of a spoken-dialogue-based user interface. In addition, detecting laughter sound can be utilized to prevent the speech recognizer from wrongly recognizing the laughter sound as meaningful words. In this paper, a method to detect smile expression and laughter sound robustly by combining an image-based facial expression recognition method and an audio-based laughter sound recognition method. The image-based method uses a feature vector based on feature point detection from face images. The method could detect smile faces by more than 80% recall and precision rate. A method to combine a GMM-based laughter sound recognizer and the image-based method could improve the accuracy of detection of laughter sounds compared with methods that use image or sound only. As a result, more than 70% recall and precision rate of laughter sound detection was obtained from the natural conversation videos.
AB - This paper describes a method to detect smiles and laughter sounds from the video of natural dialogue. A smile is the most common facial expression observed in a dialogue. Detecting a user's smiles and laughter sounds can be useful for estimating the mental state of the user of a spoken-dialogue-based user interface. In addition, detecting laughter sound can be utilized to prevent the speech recognizer from wrongly recognizing the laughter sound as meaningful words. In this paper, a method to detect smile expression and laughter sound robustly by combining an image-based facial expression recognition method and an audio-based laughter sound recognition method. The image-based method uses a feature vector based on feature point detection from face images. The method could detect smile faces by more than 80% recall and precision rate. A method to combine a GMM-based laughter sound recognizer and the image-based method could improve the accuracy of detection of laughter sounds compared with methods that use image or sound only. As a result, more than 70% recall and precision rate of laughter sound detection was obtained from the natural conversation videos.
UR - http://www.scopus.com/inward/record.url?scp=33745159644&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745159644&partnerID=8YFLogxK
U2 - 10.1109/CW.2005.82
DO - 10.1109/CW.2005.82
M3 - Conference contribution
AN - SCOPUS:33745159644
SN - 0769523781
SN - 9780769523781
T3 - Proceedings - 2005 International Conference on Cyberworlds, CW 2005
SP - 437
EP - 444
BT - Proceedings - 2005 International Conference on Cyberworlds, CW 2005
T2 - 2005 International Conference on Cyberworlds, CW 2005
Y2 - 23 November 2005 through 25 November 2005
ER -