TY - GEN
T1 - Semi-supervised lexicon mining from parenthetical expressions in monolingual web pages
AU - Wu, Xianchao
AU - Okazaki, Naoaki
AU - Tsujii, Jun'ichi
PY - 2009
Y1 - 2009
N2 - This paper presents a semi-supervised learning framework for mining Chinese-English lexicons from large amount of Chinese Web pages. The issue is motivated by the observation that many Chinese neologisms are accompanied by their English translations in the form of parenthesis. We classify parenthetical translations into bilingual abbreviations, transliterations, and translations. A frequency-based term recognition approach is applied for extracting bilingual abbreviations. A self-training algorithm is proposed for mining transliteration and translation lexicons. In which, we employ available lexicons in terms of morpheme levels, i.e., phoneme correspondences in transliteration and grapheme (e.g., suffix, stem, and prefix) correspondences in translation. The experimental results verified the effectiveness of our approaches.
AB - This paper presents a semi-supervised learning framework for mining Chinese-English lexicons from large amount of Chinese Web pages. The issue is motivated by the observation that many Chinese neologisms are accompanied by their English translations in the form of parenthesis. We classify parenthetical translations into bilingual abbreviations, transliterations, and translations. A frequency-based term recognition approach is applied for extracting bilingual abbreviations. A self-training algorithm is proposed for mining transliteration and translation lexicons. In which, we employ available lexicons in terms of morpheme levels, i.e., phoneme correspondences in transliteration and grapheme (e.g., suffix, stem, and prefix) correspondences in translation. The experimental results verified the effectiveness of our approaches.
UR - http://www.scopus.com/inward/record.url?scp=84863337827&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863337827&partnerID=8YFLogxK
U2 - 10.3115/1620754.1620816
DO - 10.3115/1620754.1620816
M3 - Conference contribution
AN - SCOPUS:84863337827
SN - 9781932432411
T3 - NAACL HLT 2009 - Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Conference
SP - 424
EP - 432
BT - NAACL HLT 2009 - Human Language Technologies
PB - Association for Computational Linguistics (ACL)
T2 - Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2009
Y2 - 31 May 2009 through 5 June 2009
ER -