在多标记分类中,标签与标签之间的相关关系是影响分类效果的一个重要因子。传统的经典多标签分类方法如BR算法、ML-KNN算法等,忽略了标签之间的相关关系对实际分类的影响,分类效果一直不能令人满意;面对类别关联度极高的不良信息的多标签分类,分类效果更是大打折扣。针对上述问题,通过改进经典的多标签分类算法RAk EL,首先根据训练文本计算出各标签之间的相似度系数,然后再根据自定义不良信息层次关系计算出综合标签相似度系数矩阵,最后在RAk EL算法投票过程中根据综合标签相似度与中心标签重新确定最终的结果标签集合。与传统的分类方法在真实的语料库上进行多标签分类效果对比,结果证明,该方法对不良信息分类具有较好的效果。
In the multi-label classification,the relationship between the labels plays an important role in affecting the performance of classification. The traditional methods of multi-label classification handled each label independently,ignored the influence of the relationship between labels,so that the effect of the classification was often not satisfactory,especially in the situation of dealing with the bad information. Aiming at these problems above,this paper presented a modified algorithm based on the RAk EL,a classic algorithms for the multi-label classification. The algorithm firstly worked out the similarity coefficient between labels,and then calculated the similarity coefficient matrix between labels according to the hierarchy chart for the bad information. Finally,in the voting process of RAk EL,it figured out the result set with the similarity coefficient matrix. Experimental results on the real corpus involving bad information show that the proposed method can achieve better performance compared to traditional multi-label classification methods.