针对汉语词性标注中词性类别划分较细、类别较多的问题,提出一种利用双层条件随机场进行汉语词性标注的方法,该方法将汉语词性标注分为两个阶段,每个阶段采用一层条件随机场建模实现。第一阶段底层条件随机场根据上下文产生每个词语的词性粗分结果;第二阶段高层条件随机场将词语及其粗分结果作为上下文特征对每个词语的词性进一步细分,产生最终词性标记。利用CRF++0.53工具包,在国际汉语分词评测Bakeoff2007(国际汉语分词评测)的NCC和CTB语料上进行了实验,结果表明该方法可行且可以获得较好的标注结果。
Chinese part-of-speech tagging often has the problem of too many well defined lexical catalogs. To improve this problem,the paper proposes a Chinese part-of-speech tagging method based on Dual-Lay-er conditional random fields.The approach divides the tagging procedure into two stages,each of which uses single-lyer conditional random fields to complete modeling.The first stage using context achieves coarse -grained part-of-speech tagging of each word.Taken the coarse-grained result as features,the second stage further produces sequences of fine-grained part-of-speech tags.Closed evaluations are performed on NCC and CTB corpus from the Bakeoff-2007 ,and comparative experiments are performed on different feature tem-plates.Experimental results show that this approach can obtain better pos tagging set.