TY - JOUR
T1 - Supervised approaches for Japanese Wikification
AU - Zhou, Shuangshuang
AU - Okazaki, Naoaki
AU - Matsuda, Koji
AU - Den, Zen
AU - Inui, Kentaro
N1 - Funding Information:
This work was partially supported by JSPS KAKENHI Grant Number 15H01702 and 15H05318, and JST, CREST.
Publisher Copyright:
© 2017 Information Processing Society of Japan.
PY - 2017
Y1 - 2017
N2 - Wikification is the task of connecting mentions in texts to entities in a large-scale knowledge base, Wikipedia. In this paper, we present a pipeline system for Japanese Wikification that consists of two components, namely candidate generation and candidate ranking. We investigate several techniques for each component, using a recently developed Japanese Wikification corpus. For candidate generation, we find that a name dictionary using anchor texts of Wikipedia is more effective than other methods based on similarity of surface forms. For candidate ranking, we verify that a set of features used in English Wikification is effective in Japanese Wikification as well. In addition, by using a corpus that links mentions to Japanese Wikipedia entries instead of to English Wikipedia entries, we are able to acquire rich contextual information from Japanese Wikipedia articles, which leads to improvements for Japanese mention disambiguation. We take this advantage by exploring several embedding models that encode context information of Wikipedia entities. The experimental results demonstrate that they improve candidate ranking. We also report the effect of each feature in detail. To sum, our system achieves 81.60% accuracy, significantly outperforming the previous work.
AB - Wikification is the task of connecting mentions in texts to entities in a large-scale knowledge base, Wikipedia. In this paper, we present a pipeline system for Japanese Wikification that consists of two components, namely candidate generation and candidate ranking. We investigate several techniques for each component, using a recently developed Japanese Wikification corpus. For candidate generation, we find that a name dictionary using anchor texts of Wikipedia is more effective than other methods based on similarity of surface forms. For candidate ranking, we verify that a set of features used in English Wikification is effective in Japanese Wikification as well. In addition, by using a corpus that links mentions to Japanese Wikipedia entries instead of to English Wikipedia entries, we are able to acquire rich contextual information from Japanese Wikipedia articles, which leads to improvements for Japanese mention disambiguation. We take this advantage by exploring several embedding models that encode context information of Wikipedia entities. The experimental results demonstrate that they improve candidate ranking. We also report the effect of each feature in detail. To sum, our system achieves 81.60% accuracy, significantly outperforming the previous work.
KW - Entity linking
KW - Named entity disambiguation
KW - SVM
KW - Wikification
UR - http://www.scopus.com/inward/record.url?scp=85018471181&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018471181&partnerID=8YFLogxK
U2 - 10.2197/ipsjjip.25.341
DO - 10.2197/ipsjjip.25.341
M3 - Article
AN - SCOPUS:85018471181
VL - 25
SP - 341
EP - 350
JO - Journal of Information Processing
JF - Journal of Information Processing
SN - 0387-5806
ER -