Dictation of Japanese Speech Based on Kana and Kanji Character String

Akinori Ito, Hiroaki Kinno, Masaharu Katoh, Tetsuo Kosaka, Masaki Kohda

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, character-based Japanese dictation method is proposed. This method is based on the kana and kanji string language model proposed by Ito et al. First, sentences in the training corpus are split into character-based units (CBUs). Then strings of CBUs (CBUSes) are chosen from the CBU corpus based on a statistical criterion. We examined three criteria for the CBUS selection. They are the frequency-based selection, the mutual-information based selection and their combination. From the experimental results, it was found that the combined method gave the best result (7.19% and 8.75% CBU error rates for the 20k and the 60k word vocabulary conditions, respectively) which was better than the ordinary word-based method (7.61% and 9.15% CBU error rates for the 20k and the 60k word vocabulary conditions, respectively).

Original languageEnglish
Pages (from-to)75-98
Number of pages24
JournalInternational Journal of Computer Processing of Oriental Languages
Volume22
Issue number1
Publication statusPublished - 2009
Externally publishedYes

ASJC Scopus subject areas

  • Psychology(all)
  • Arts and Humanities(all)
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Dictation of Japanese Speech Based on Kana and Kanji Character String'. Together they form a unique fingerprint.

Cite this