位置:成果数据库 > 期刊 > 期刊详情页
关联文本分类的规则修正策略
  • ISSN号:1000-1239
  • 期刊名称:《计算机研究与发展》
  • 时间:0
  • 分类:TP311.13[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]西南财经大学中国支付体系研究中心,成都610074, [2]四川大学计算机学院,成都610065, [3]天津师范大学计算机与信息工程学院,天津300387, [4]成都信息工程学院智能信息处理实验室,成都610225
  • 相关基金:“十一五”国家科技支撑计划基金项目(2006BA105A01);国家自然科学基金项目(60773169)
中文摘要:

通过分析基于关联规则的文本分类,发现在保持分类规则对正例样本正确分类的同时减少对反例样本的错误分类可以提高分类的精确度.基于否定选择算法的思想提出了分类规则修正策略,用反例样本集合对分类规则进行耐受,从分类规则错误判别的反例样本中再产生规则,与原来的规则组成新规则,称为增强关联规则.基于修正策略产生的增强关联规则可以大幅度地减少对反例样本的错误分类,从而提高分类的精确度.通过形式化证明和实验,分类规则修正策略的有效性得到验证.

英文摘要:

Text classification is an important field in data mining and machine learning. In recent years, the use of association rules for text categorization has attracted great interest and a variety of useful methods have been developed. These works focus on how to generate classification rules and then pick rules to build a high accuracy classifier. By analyzing association-rule based text classification, an observation may be obtained that decreasing error classification for negative samples may improve classification accuracy while keeping categorizing positive samples unchanged. Inspired by negative selection algorithm, the authors propose a classification rule revising strategy to implement the above observation. First, a new rule, called negative rule, is generated by mining frequent item sets on negative samples that are error categorized by a classification rule. Then the classification rule is combined with its negative rules to generate an enhanced association rule. The enhanced association rules can dramatically decrease error categorization for negative samples, and therefore classification accuracy is improved. Experiments are conducted on a real Web pages dataset. Compared with text classification algorithms (CMAR, S-EM and NB), the rule revising strategy may further improve classification accuracy. The utility and feasibility of the revising rule strategy are also demonstrated by formalization proof.

同期刊论文项目
同项目期刊论文
期刊信息
  • 《计算机研究与发展》
  • 中国科技核心期刊
  • 主管单位:中国科学院
  • 主办单位:中国科学院计算技术研究所
  • 主编:徐志伟
  • 地址:北京市科学院南路6号中科院计算所
  • 邮编:100190
  • 邮箱:crad@ict.ac.cn
  • 电话:010-62620696 62600350
  • 国际标准刊号:ISSN:1000-1239
  • 国内统一刊号:ISSN:11-1777/TP
  • 邮发代号:2-654
  • 获奖情况:
  • 2001-2007百种中国杰出学术期刊,2008中国精品科...,中国期刊方阵“双效”期刊
  • 国内外数据库收录:
  • 俄罗斯文摘杂志,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:40349