本文提出一种基于AdaBoost.MH算法的有指导的汉语多义词消歧方法,该方法利用AdaBoost.MH算法对决策树产生的弱规则进行加强,经过若干次迭代后,最终得到一个准确度更高的分类规则;并给出了一种简单的终止算法中迭代的方法;为获取多义词上下文中的知识源,在采用传统的词性标注和局部搭配序列等知识源的基础上,引入了一种新的知识源,即语义范畴,提高了算法的学习效率和排歧的正确率.通过对6个典型多义词和SENSEVAL3中文语料中20个多义词的词义消歧实验,AdaBoost.MH算法获得了较高的开放测试正确率(85.75%).
An approach based on supervised AdaBoost. MH learning algorithm for Chinese word sense disambiguation is presented. AdaBoost. MH algorithm is employed to boost the accuracy of the weak decision stumps rules for trees and repeatedly calls a learner to finally produce a more accurate rule. A simple stopping criterion is also presented. In order to extract more contextual information, we introduce a new semantic categorization knowledge which is useful for improving the learulng efficiency of the algorithm and accuracy of disamhiguation, in addition to using two classical knowledge sources, part-of-speech of neighboring words and local collocations. AdaBoost. MH algorithm making use of these knowledge sources achieves 85.75% disambiguafion accuracy in open test for 6 typical polysemous words and 20 polysemous words of SENSEVAL3 Chinese corpus.