Construction and evaluation of language models based on stochastic context-free grammar for speech recognition

Chiori Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

This paper deals with the use of a stochastic context-free grammar (SCFG) for large vocabulary continuous speech recognition; in particular, an SCFG with phrase-level dependency rules is built. Unlike n-gram models, the SCFG can describe not only local constraints but also global constraints pertaining to the sentence as a whole, thus making possible language models with great expressive power. However, the inside-outside algorithm must be used for estimation of the SCFG parameters, which involves a great amount of calculation, proportional to the third power of the number of nonterminal symbols and of the input string length. Hence, due to problems in dealing with extensive text corpora, the SCFG has hardly been applied as a language model for very large vocabulary continuous speech recognition. The proposed phrase-level dependency SCFG allows a significant reduction of the computational load. In experiments with the EDR corpus, the proposed method proved effective. In experiments with the Mainichi corpus, a large-scale phrase-level dependency SCFG was built for a very large vocabulary continuous speech recognition system. Speech recognition tests with a vocabulary of about 5000 words showed that the proposed method could not compare with the trigram model in performance; however, when it was used in combination with a trigram model, the error rate was reduced by 14% compared to the trigram model alone.

Original languageEnglish
Pages (from-to)48-59
Number of pages12
JournalSystems and Computers in Japan
Volume33
Issue number13
DOIs
Publication statusPublished - 2002 Nov 30
Externally publishedYes

Keywords

  • Dependency grammar
  • Inside-outside algorithm
  • Language model
  • Speech recognition
  • Stochastic context-free grammar

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Construction and evaluation of language models based on stochastic context-free grammar for speech recognition'. Together they form a unique fingerprint.

  • Cite this