东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于标签相似度的不良信息多标签分类方法

ISSN号：1001-3695
期刊名称：《计算机应用研究》
时间：0
分类：TP391.43[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]昆明理工大学信息工程与自动化学院,昆明650500, [2]昆明理工大学教育技术与网络中心,昆明650500, [3]云南省计算机技术应用重点实验室,昆明650500
相关基金：国家自然科学基金资助项目（81360230）; 国家科技部科技型中小企业技术创新基金资助项目（13C26215305404）

作者：刘卓然[1], 胡杨[1], 刘骊[1], 冯旭鹏[2], 刘利军[1], 黄青松[1,3]

中文摘要：

在多标记分类中,标签与标签之间的相关关系是影响分类效果的一个重要因子。传统的经典多标签分类方法如BR算法、ML-KNN算法等,忽略了标签之间的相关关系对实际分类的影响,分类效果一直不能令人满意;面对类别关联度极高的不良信息的多标签分类,分类效果更是大打折扣。针对上述问题,通过改进经典的多标签分类算法RAk EL,首先根据训练文本计算出各标签之间的相似度系数,然后再根据自定义不良信息层次关系计算出综合标签相似度系数矩阵,最后在RAk EL算法投票过程中根据综合标签相似度与中心标签重新确定最终的结果标签集合。与传统的分类方法在真实的语料库上进行多标签分类效果对比,结果证明,该方法对不良信息分类具有较好的效果。

英文摘要：

In the multi-label classification,the relationship between the labels plays an important role in affecting the performance of classification. The traditional methods of multi-label classification handled each label independently,ignored the influence of the relationship between labels,so that the effect of the classification was often not satisfactory,especially in the situation of dealing with the bad information. Aiming at these problems above,this paper presented a modified algorithm based on the RAk EL,a classic algorithms for the multi-label classification. The algorithm firstly worked out the similarity coefficient between labels,and then calculated the similarity coefficient matrix between labels according to the hierarchy chart for the bad information. Finally,in the voting process of RAk EL,it figured out the result set with the similarity coefficient matrix. Experimental results on the real corpus involving bad information show that the proposed method can achieve better performance compared to traditional multi-label classification methods.

同期刊论文项目