Abstract
We propose an approach to correcting spelling errors and assigning part-of-speech (POS) tags simultaneously for sentences written by learners of English as a second language (ESL). In ESL writing, there are several types of errors such as preposition, determiner, verb, noun, and spelling errors. Spelling errors often interfere with POS tagging and syntactic parsing, which makes other error detection and correction tasks very difficult. In studies of grammatical error detection and correction in ESL writing, spelling correction has been regarded as a preprocessing step in a pipeline. However, several types of spelling errors in ESL are difficult to correct in the preprocessing, for example, homophones (e.g. *hear/here), confusion (*quiet/quite), split (*now a day/nowadays), merge (*swimingpool/swimming pool), inflection (*please/pleased) and derivation (*badly/bad), where the incorrect word is actually in the vocabulary and grammatical information is needed to disambiguate. In order to correct these spelling errors, and also typical typographical errors (*begginning/ beginning), we propose a joint analysis of POS tagging and spelling error correction with a CRF (Conditional Random Field)-based model. We present an approach that achieves significantly better accuracies for both POS tagging and spelling correction, compared to existing approaches using either individual or pipeline analysis. We also show that the joint model can deal with novel types of misspelling in ESL writing.
Original language | English |
---|---|
Pages | 2357-2374 |
Number of pages | 18 |
Publication status | Published - 2012 |
Externally published | Yes |
Event | 24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India Duration: 2012 Dec 8 → 2012 Dec 15 |
Other
Other | 24th International Conference on Computational Linguistics, COLING 2012 |
---|---|
Country/Territory | India |
City | Mumbai |
Period | 12/12/8 → 12/12/15 |
Keywords
- Part-of-speech tagging
- Spelling error correction
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Language and Linguistics
- Linguistics and Language