An algorithm for fast calculation of back-off n-gram probabilities with unigram rescaling

Masaharu Kato, Tetsuo Kosaka, Akinori Ito, Shozo Makino

Research output: Contribution to journalArticle

Abstract

Topic-based stochastic models such as the probabilistic latent semantic analysis (PLSA) are good tools for adapting a language model into a specific domain using a constraint of global context. A probability given by a topic model is combined with an n-gram probability using the unigram rescaling scheme. One practical problem to apply PLSA to speech recognition is that calculation of probabilities using PLSA is computationally expensive, that prevents the topic-based language model from incorporating that model into decoding process. In this paper, we proposed an algorithm to calculate a back-off n-gram probability with unigram rescaling quickly, without any approximation. This algorithm reduces the calculation of a normalizing factor drastically, which only requires calculation of probabilities of words that appears in the current context. The experimental result showed that the proposed algorithm was more than 6000 times faster than the naive calculation method.

Original languageEnglish
JournalIAENG International Journal of Computer Science
Volume36
Issue number4
Publication statusPublished - 2009 Nov 1

Keywords

  • Back-off smoothing
  • N-gram
  • Probabilistic latent semantic analysis
  • Unigram rescaling

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'An algorithm for fast calculation of back-off n-gram probabilities with unigram rescaling'. Together they form a unique fingerprint.

Cite this