面向英语文章的词性标注是对英语文章实现自动批改的基础,虽然研究者对英语词性标注做了大量有益的研究,但是大多数的研究都面向英语为第一语言的用户,而面向英语为第二语言用户的相关研究则很少.为此,对以英语为第二语言用户的英语文章进行了人工标注,在此基础上提出了一种面向英语文章的词性标注算法,融合了词聚类、无标语料统计信息、单词发音等特征.实验结果表明,该算法能有效提高词性标注性能,标注正确率从94.49%可提高到97.07%.
Part-of-speech tagging for Chinese English learner language is the base of automated essay scoring system. Much of fruitful part-of-speech tagging researches researchers was done,however,most of them are focused on the English essays written by native speaker,there is no research about essays of Chinese English learner. A corpus of Chinese English learner essay are annotated,and a part-of-speech tagging algorithm for Chinese English learner language is presented. This algorithm uses rich features,such as unsupervised word clusters,unsupervised tag dictionary and phonetic normalization. Based on these rich features,the system outperforms the state-of-art tagging on the corpus,and the tagging accuracy is raised from 94. 49% to 97. 07%.