TY - GEN
T1 - A pipeline Japanese entity linking system with embedding features
AU - Zhou, Shuangshuang
AU - Matsuda, Koji
AU - Den, Zen
AU - Okazaki, Naoaki
AU - Inui, Kentaro
PY - 2016/1/1
Y1 - 2016/1/1
N2 - Entity linking (EL) is the task of connecting mentions in texts to entities in a large-scale knowledge base such asWikipedia. In this paper, we present a pipeline system for Japanese EL which consists of two standard components, namely candidate generation and candidate ranking. We investigate several techniques for each component, using a recently developed Japanese EL corpus. For candidate generation, we find that a concept dictionary using anchor texts of Wikipedia is more effective than methods based on surface similarity. For candidate ranking, we verify that a set of features used in English EL is effective in Japanese EL as well. In addition, by using a corpus that links Japanese mentions to Japanese Wikipedia entries, we are able to get rich context information from Japanese Wikipedia articles and benefit mention disambiguation. It was not directly possible with previous EL corpora, which associate mentions to English Wikipedia entities. We take this advantage by exploring several embedding models that encode context information of Wikipedia entities, and show that they improve candidate ranking. As a whole, our system achieves 82.27% accuracy, significantly outperforming previous work.
AB - Entity linking (EL) is the task of connecting mentions in texts to entities in a large-scale knowledge base such asWikipedia. In this paper, we present a pipeline system for Japanese EL which consists of two standard components, namely candidate generation and candidate ranking. We investigate several techniques for each component, using a recently developed Japanese EL corpus. For candidate generation, we find that a concept dictionary using anchor texts of Wikipedia is more effective than methods based on surface similarity. For candidate ranking, we verify that a set of features used in English EL is effective in Japanese EL as well. In addition, by using a corpus that links Japanese mentions to Japanese Wikipedia entries, we are able to get rich context information from Japanese Wikipedia articles and benefit mention disambiguation. It was not directly possible with previous EL corpora, which associate mentions to English Wikipedia entities. We take this advantage by exploring several embedding models that encode context information of Wikipedia entities, and show that they improve candidate ranking. As a whole, our system achieves 82.27% accuracy, significantly outperforming previous work.
UR - http://www.scopus.com/inward/record.url?scp=85015835103&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015835103&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85015835103
T3 - Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016
SP - 267
EP - 276
BT - Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016
A2 - Park, Jong C.
A2 - Chung, Jin-Woo
PB - Institute for the Study of Language and Information
T2 - 30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016
Y2 - 28 October 2016 through 30 October 2016
ER -