TY - GEN
T1 - A precise evaluation method of prosodic quality of non-native speakers using average voice and prosody substitution
AU - Prafianto, Hafiyan
AU - Nose, Takashi
AU - Ito, Akinori
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/2/7
Y1 - 2017/2/7
N2 - We propose a method to improve the consistency of human evaluation of non-native speaker's utterance, with a capability to evaluate features such as accent and rhythm. In this method, human evaluators evaluate the accent and the rhythm independently by using average voice model and prosody substitution. We also investigated the advantages of evaluating those features independently. We found that, when the prosodic features are not evaluated independently, the accent scores are affected by the goodness of the rhythm and vice versa. The correlation coefficient of the accent score and the rhythm score of identical utterances was 0.23 using the conventional method and -0.026 using the proposed method. This also leads to greater disagreement between the scores given by different evaluators. Using the conventional method, 23% of the pairs between evaluators have their inter-evaluator correlation of the rhythm score more than 0.5, while using this proposed method, 67% of the pairs have the inter-evaluator correlation more than 0.5.
AB - We propose a method to improve the consistency of human evaluation of non-native speaker's utterance, with a capability to evaluate features such as accent and rhythm. In this method, human evaluators evaluate the accent and the rhythm independently by using average voice model and prosody substitution. We also investigated the advantages of evaluating those features independently. We found that, when the prosodic features are not evaluated independently, the accent scores are affected by the goodness of the rhythm and vice versa. The correlation coefficient of the accent score and the rhythm score of identical utterances was 0.23 using the conventional method and -0.026 using the proposed method. This also leads to greater disagreement between the scores given by different evaluators. Using the conventional method, 23% of the pairs between evaluators have their inter-evaluator correlation of the rhythm score more than 0.5, while using this proposed method, 67% of the pairs have the inter-evaluator correlation more than 0.5.
KW - Average voice
KW - CALL system
KW - CAPT system
KW - Evaluation of prosodic quality
KW - Prosody substitution
UR - http://www.scopus.com/inward/record.url?scp=85016097292&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85016097292&partnerID=8YFLogxK
U2 - 10.1109/ICALIP.2016.7846620
DO - 10.1109/ICALIP.2016.7846620
M3 - Conference contribution
AN - SCOPUS:85016097292
T3 - ICALIP 2016 - 2016 International Conference on Audio, Language and Image Processing - Proceedings
SP - 208
EP - 212
BT - ICALIP 2016 - 2016 International Conference on Audio, Language and Image Processing - Proceedings
A2 - Luo, Fa-Long
A2 - Yu, Xiaoqing
A2 - Wan, Wanggen
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th International Conference on Audio, Language and Image Processing, ICALIP 2016
Y2 - 11 July 2016 through 12 July 2016
ER -