Mining revision log of language learning SNS for automated Japanese error correction

Tomoya Mizumoto, Mamoru Komachi, Masaaki Nagata, Yuji Matsumoto

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Recently, natural language processing research has begun to pay attention to second language learning. However, it is not easy to acquire a large-scale learners' corpus, which is important for a research for second language learning by natural language processing. We present an attempt to extract a large-scale Japanese learners' corpus from the revision log of a language learning social network service.This corpus is easy to obtain in large-scale, covers a wide variety of topics and styles, and can be a great source of knowledge for both language learners and instructors. We also demonstrate that the extracted learners' corpus of Japanese as a second language can be used as training data for learners' error correction using a statistical machine translation approach.We evaluate different granularities of tokenization to alleviate the problem of word segmentation errors caused by erroneous input from language learners.We propose a character-based SMT approach to alleviate the problem of er oneous input from language learners.Experimental results show that the character-based model outperforms the word-based model when corpus size is small and test data is written by the learners whose L1 is English.

Original languageEnglish
Pages (from-to)420-432
Number of pages13
JournalTransactions of the Japanese Society for Artificial Intelligence
Volume28
Issue number5
DOIs
Publication statusPublished - 2013 Jul 10

Keywords

  • Japanese error correction
  • Language learning SNS
  • Mining revision log
  • Second language learning

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Mining revision log of language learning SNS for automated Japanese error correction'. Together they form a unique fingerprint.

Cite this