In this paper, character-based Japanese dictation method is proposed. This method is based on the kana and kanji string language model proposed by Ito et al. First, sentences in the training corpus are split into character-based units (CBUs). Then strings of CBUs (CBUSes) are chosen from the CBU corpus based on a statistical criterion. We examined three criteria for the CBUS selection. They are the frequency-based selection, the mutual-information based selection and their combination. From the experimental results, it was found that the combined method gave the best result (7.19% and 8.75% CBU error rates for the 20k and the 60k word vocabulary conditions, respectively) which was better than the ordinary word-based method (7.61% and 9.15% CBU error rates for the 20k and the 60k word vocabulary conditions, respectively).
|Number of pages||24|
|Journal||International Journal of Computer Processing of Oriental Languages|
|Publication status||Published - 2009|
ASJC Scopus subject areas
- Arts and Humanities(all)
- Computer Science(all)