针对条件随机场分词不具有良好的领域自适应性,提出一种条件随机场与领域词典相结合的方法提高领域自适应性,并根据构词规则提出了固定词串消解,动词消解,词概率消解三种方法消除歧义。实验结果表明,该分词流程和方法,提高了分词的准确率和自适应性,在计算机领域和医学领域的分词结果F 值分别提升了7.6%和8.7%。
According to the Conditional Random Field for Chinese word segmentation, the field is hard to adaptive. Acombination of CRF and domain dictionary is proposed to improve the field adaptability, and for eliminating ambiguity,this paper uses fixed word collocation, verb dictionary and word probability by the rule of word formation. The experientalresults show that this approach improves the accuracy and adaptability of the word segmentation. F value of the segmentationresults in computer and medical fields is increased by 7.6% and 8.7%.