An algorithm for fast calculation of back-off n-gram probabilities with unigram rescaling

Masaharu Kato, Tetsuo Kosaka, Akinori Ito, Shozo Makino

研究成果: Article査読

抄録

Topic-based stochastic models such as the probabilistic latent semantic analysis (PLSA) are good tools for adapting a language model into a specific domain using a constraint of global context. A probability given by a topic model is combined with an n-gram probability using the unigram rescaling scheme. One practical problem to apply PLSA to speech recognition is that calculation of probabilities using PLSA is computationally expensive, that prevents the topic-based language model from incorporating that model into decoding process. In this paper, we proposed an algorithm to calculate a back-off n-gram probability with unigram rescaling quickly, without any approximation. This algorithm reduces the calculation of a normalizing factor drastically, which only requires calculation of probabilities of words that appears in the current context. The experimental result showed that the proposed algorithm was more than 6000 times faster than the naive calculation method.

本文言語English
ジャーナルIAENG International Journal of Computer Science
36
4
出版ステータスPublished - 2009 11 1

ASJC Scopus subject areas

  • Computer Science(all)

フィンガープリント 「An algorithm for fast calculation of back-off n-gram probabilities with unigram rescaling」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル