为了克服传统KNN算法,距离加权-KNN算法在距离定义及投票方式上的不足,提出了一种基于属性值对类别重要性的改进算法Entropy-KNN。首先定义两个样本间的距离为相同属性值的平均信息熵,此距离可通过重要属性值有效度量样本之间的相似程度,其次算法Entropy-KNN根据上述定义的距离选取与待测试样本距离最小的K个近邻,最后根据各类近邻样本点的平均距离及个数判断待测试样本的类别。在蘑菇数据集上的实验表明,Entropy-KNN算法的分类准确率高于传统KNN算法和距离加权KNN算法。
In order to improve traditional KNN and KNN with weighted distance,which is on the distance definition and test mode,an improved algorithm entropy-KNN based on the classification importance of an attribute value is proposed.At first,a distance of the two samples is defined as the average information entropy of the same attribute values.The distance can effectively measure the similarity degree of the two samples.Secondly,the Entropy-KNN selects the K nearest neighbors by the distance above.Finally,the class label of the test sample is decided by the average distance and the numbers on the respective class.The experimental results on mushroom data set show this approach has much better than traditional KNN and KNN with weighted distance.