A discriminative candidate generator for string transformations

Naoaki Okazaki, Yoshimasa Tsuruoka, Sophia Ananiadou, Jun'ichi Tsujii

Research output: Contribution to conferencePaper

16 Citations (Scopus)

Abstract

String transformation, which maps a source string s into its desirable form t*, is related to various applications including stemming, lemmatization, and spelling correction. The essential and important step for string transformation is to generate candidates to which the given string s is likely to be transformed. This paper presents a discriminative approach for generating candidate strings. We use substring substitution rules as features and score them using an L1-regularized logistic regression model. We also propose a procedure to generate negative instances that affect the decision boundary of the model. The advantage of this approach is that candidate strings can be enumerated by an efficient algorithm because the processes of string transformation are tractable in the model. We demonstrate the remarkable performance of the proposed method in normalizing inflected words and spelling variations.

Original languageEnglish
Pages447-456
Number of pages10
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation - Honolulu, HI, United States
Duration: 2008 Oct 252008 Oct 27

Other

Other2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation
CountryUnited States
CityHonolulu, HI
Period08/10/2508/10/27

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint Dive into the research topics of 'A discriminative candidate generator for string transformations'. Together they form a unique fingerprint.

  • Cite this

    Okazaki, N., Tsuruoka, Y., Ananiadou, S., & Tsujii, J. (2008). A discriminative candidate generator for string transformations. 447-456. Paper presented at 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation, Honolulu, HI, United States. https://doi.org/10.3115/1613715.1613772