This paper proposes a method for the unsupervised learning of lexicons from pairs of a spoken utterance and an object as its meaning without any a priori linguistic knowledge other than a phoneme acoustic model. In order to obtain a lexicon, a statistical model of the joint probability of a spoken utterance and an object is learned based on the minimum description length principle. This model consists of a list of word phoneme sequences and three statistical models: the phoneme acoustic model, a word-bigram model, and a word meaning model. Experimental results show that the method can acquire acoustically, grammatically and semantically appropriate words with about 85% phoneme accuracy.
|ジャーナル||Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH|
|出版ステータス||Published - 2009 11月 26|
|イベント||10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom|
継続期間: 2009 9月 6 → 2009 9月 10
ASJC Scopus subject areas