An unsupervised language model adaptation based on keyword clustering and query availability estimation

Akinori Ito, Yasutomo Kajiura, Shozo Makino, Motoyuki Suzuki

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

Language model adaptation using text data downloaded from the WWW is an efficient way to train a topic-specific LM. We are developing an unsupervised LM adaptation method using data in the Web. The one key point of unsupervised Web-based LM adaptation is how to select keywords to compose the search query. In this paper, we propose a new method of selecting keywords from keyword candidates, which uses a keyword clustering technique based on word similarities. The other key point is how to determine the number of downloaded pages for each query. In this paper we propose a method to estimate "a query availability," which is based on a small number of downloaded Web pages. The experimental result showed that the determination of downloaded pages using the query availability was effective than the conventional methods that determined the number of pages empirically.

本文言語English
ホスト出版物のタイトルICALIP 2008 - 2008 International Conference on Audio, Language and Image Processing, Proceedings
ページ1412-1418
ページ数7
DOI
出版ステータスPublished - 2008 9 22
イベントICALIP 2008 - 2008 International Conference on Audio, Language and Image Processing - Shanghai, China
継続期間: 2008 7 72008 7 9

出版物シリーズ

名前ICALIP 2008 - 2008 International Conference on Audio, Language and Image Processing, Proceedings

Other

OtherICALIP 2008 - 2008 International Conference on Audio, Language and Image Processing
CountryChina
CityShanghai
Period08/7/708/7/9

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

フィンガープリント 「An unsupervised language model adaptation based on keyword clustering and query availability estimation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル