A pipeline Japanese entity linking system with embedding features

Shuangshuang Zhou, Koji Matsuda, Zen Den, Naoaki Okazaki, Kentaro Inui

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Entity linking (EL) is the task of connecting mentions in texts to entities in a large-scale knowledge base such asWikipedia. In this paper, we present a pipeline system for Japanese EL which consists of two standard components, namely candidate generation and candidate ranking. We investigate several techniques for each component, using a recently developed Japanese EL corpus. For candidate generation, we find that a concept dictionary using anchor texts of Wikipedia is more effective than methods based on surface similarity. For candidate ranking, we verify that a set of features used in English EL is effective in Japanese EL as well. In addition, by using a corpus that links Japanese mentions to Japanese Wikipedia entries, we are able to get rich context information from Japanese Wikipedia articles and benefit mention disambiguation. It was not directly possible with previous EL corpora, which associate mentions to English Wikipedia entities. We take this advantage by exploring several embedding models that encode context information of Wikipedia entities, and show that they improve candidate ranking. As a whole, our system achieves 82.27% accuracy, significantly outperforming previous work.

Original languageEnglish
Title of host publicationProceedings of the 30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016
EditorsJong C. Park, Jin-Woo Chung
PublisherInstitute for the Study of Language and Information
Pages267-276
Number of pages10
ISBN (Electronic)9788968174285
Publication statusPublished - 2016 Jan 1
Event30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016 - Seoul, Korea, Republic of
Duration: 2016 Oct 282016 Oct 30

Publication series

NameProceedings of the 30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016

Other

Other30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016
CountryKorea, Republic of
CitySeoul
Period16/10/2816/10/30

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)
  • Information Systems

Fingerprint Dive into the research topics of 'A pipeline Japanese entity linking system with embedding features'. Together they form a unique fingerprint.

  • Cite this

    Zhou, S., Matsuda, K., Den, Z., Okazaki, N., & Inui, K. (2016). A pipeline Japanese entity linking system with embedding features. In J. C. Park, & J-W. Chung (Eds.), Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016 (pp. 267-276). (Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016). Institute for the Study of Language and Information.