Graph-structured conditional random fields for named entity categorization in Wikipedia

Yotaro Watanabe, Masayuki Asahara, Yuji Matsumoto

Research output: Contribution to journalArticle

Abstract

This paper presents a method for categorizing named entities in Wikipedia. In Wikipedia, an anchor text is glossed in a linked HTML text. We formalize named entity categorization as a task of categorizing anchor texts with linked HTML texts which glosses a named entity. Using this representation, we introduce a graph structure in which anchor texts are regarded as nodes. In order to incorporate HTML structure on the graph, three types of cliques are defined based on the HTML tree structure. We propose a method with Conditional Random Fields (CRTs) to categorize the nodes on the graph. Since the defined graph may include cycles, the exact inference of CRFs is computationally expensive. We introduce an approximate inference method using Tree-based Reparameterization (TRP) to reduce computational cost. In experiments, our proposed model obtained significant improvements compare to baseline models that use Support Vector Machines.

Original languageEnglish
Pages (from-to)245-254
Number of pages10
JournalTransactions of the Japanese Society for Artificial Intelligence
Volume23
Issue number4
DOIs
Publication statusPublished - 2008 Jan 1

Keywords

  • Collective classification
  • Conditional random fields
  • Named entity acquisition

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Graph-structured conditional random fields for named entity categorization in Wikipedia'. Together they form a unique fingerprint.

  • Cite this