该文提出了一种基于CRFs的分布式策略及错误驱动的方法识别汉语组块。该方法首先将11种类型的汉语组块进行分组,结合CRFs构建不同的组块识别模型来识别组块;之后利用基于CRFs的错误驱动技术自动对分组组块进行二次识别;最后依据各分组F值大小顺序处理类型冲突。实验结果表明,基于CRFs的分布式策略及错误驱动方法识别汉语组块是有效的,系统开放式测试的精确率、召回率、F值分别达到94.90%、91.00%和92.91%,好于单独的CRFs方法、分布式策略方法及其他组合方法。
This paper proposes a distributed strategy for Chinese text chunking on the basis Conditional Random Fields(CRFs) and Error-driven technique. First eleven types of Chinese chunks are divided into different groups to build CRFs model respectively. Then, the error-driven technique is applied over CRFs chunking results for further modification. Finally, a method is described to deal with the conflicting chunking according to the F-measure values. The experimental results show that this approach is effective, outperforming the single CRFs-based approach, distributed method and other hybrid approaches in the open test by achieving reaches 94.90%, 91.00% ,and 92.91% in recall, precision, and F-measure respectively.