TY - GEN
T1 - An unsupervised language model adaptation based on keyword clustering and query availability estimation
AU - Ito, Akinori
AU - Kajiura, Yasutomo
AU - Makino, Shozo
AU - Suzuki, Motoyuki
PY - 2008
Y1 - 2008
N2 - Language model adaptation using text data downloaded from the WWW is an efficient way to train a topic-specific LM. We are developing an unsupervised LM adaptation method using data in the Web. The one key point of unsupervised Web-based LM adaptation is how to select keywords to compose the search query. In this paper, we propose a new method of selecting keywords from keyword candidates, which uses a keyword clustering technique based on word similarities. The other key point is how to determine the number of downloaded pages for each query. In this paper we propose a method to estimate "a query availability," which is based on a small number of downloaded Web pages. The experimental result showed that the determination of downloaded pages using the query availability was effective than the conventional methods that determined the number of pages empirically.
AB - Language model adaptation using text data downloaded from the WWW is an efficient way to train a topic-specific LM. We are developing an unsupervised LM adaptation method using data in the Web. The one key point of unsupervised Web-based LM adaptation is how to select keywords to compose the search query. In this paper, we propose a new method of selecting keywords from keyword candidates, which uses a keyword clustering technique based on word similarities. The other key point is how to determine the number of downloaded pages for each query. In this paper we propose a method to estimate "a query availability," which is based on a small number of downloaded Web pages. The experimental result showed that the determination of downloaded pages using the query availability was effective than the conventional methods that determined the number of pages empirically.
UR - http://www.scopus.com/inward/record.url?scp=51849111165&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=51849111165&partnerID=8YFLogxK
U2 - 10.1109/ICALIP.2008.4590103
DO - 10.1109/ICALIP.2008.4590103
M3 - Conference contribution
AN - SCOPUS:51849111165
SN - 9781424417230
T3 - ICALIP 2008 - 2008 International Conference on Audio, Language and Image Processing, Proceedings
SP - 1412
EP - 1418
BT - ICALIP 2008 - 2008 International Conference on Audio, Language and Image Processing, Proceedings
T2 - ICALIP 2008 - 2008 International Conference on Audio, Language and Image Processing
Y2 - 7 July 2008 through 9 July 2008
ER -