Semi-supervised lexicon mining from parenthetical expressions in monolingual web pages

Xianchao Wu, Naoaki Okazaki, Jun'ichi Tsujii

研究成果: Conference contribution

5 被引用数 (Scopus)

抄録

This paper presents a semi-supervised learning framework for mining Chinese-English lexicons from large amount of Chinese Web pages. The issue is motivated by the observation that many Chinese neologisms are accompanied by their English translations in the form of parenthesis. We classify parenthetical translations into bilingual abbreviations, transliterations, and translations. A frequency-based term recognition approach is applied for extracting bilingual abbreviations. A self-training algorithm is proposed for mining transliteration and translation lexicons. In which, we employ available lexicons in terms of morpheme levels, i.e., phoneme correspondences in transliteration and grapheme (e.g., suffix, stem, and prefix) correspondences in translation. The experimental results verified the effectiveness of our approaches.

本文言語English
ホスト出版物のタイトルNAACL HLT 2009 - Human Language Technologies
ホスト出版物のサブタイトルThe 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Conference
出版社Association for Computational Linguistics (ACL)
ページ424-432
ページ数9
ISBN(印刷版)9781932432411
DOI
出版ステータスPublished - 2009
外部発表はい
イベントHuman Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2009 - Boulder, CO, United States
継続期間: 2009 5 312009 6 5

出版物シリーズ

名前NAACL HLT 2009 - Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Conference

Other

OtherHuman Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2009
国/地域United States
CityBoulder, CO
Period09/5/3109/6/5

ASJC Scopus subject areas

  • 言語および言語学
  • 社会科学(その他)

フィンガープリント

「Semi-supervised lexicon mining from parenthetical expressions in monolingual web pages」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル