We propose a method to improve the consistency of human evaluation of non-native speaker's utterance, with a capability to evaluate features such as accent and rhythm. In this method, human evaluators evaluate the accent and the rhythm independently by using average voice model and prosody substitution. We also investigated the advantages of evaluating those features independently. We found that, when the prosodic features are not evaluated independently, the accent scores are affected by the goodness of the rhythm and vice versa. The correlation coefficient of the accent score and the rhythm score of identical utterances was 0.23 using the conventional method and -0.026 using the proposed method. This also leads to greater disagreement between the scores given by different evaluators. Using the conventional method, 23% of the pairs between evaluators have their inter-evaluator correlation of the rhythm score more than 0.5, while using this proposed method, 67% of the pairs have the inter-evaluator correlation more than 0.5.