位置:成果数据库 > 期刊 > 期刊详情页
一种新的基于粗糙集的动态样本识别算法
  • ISSN号:0469-5097
  • 期刊名称:南京大学学报(自然科学版)
  • 时间:0
  • 页码:501-506
  • 语言:中文
  • 分类:TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
  • 作者机构:[1]重庆邮电大学计算机科学与技术研究所,重庆400065
  • 相关基金:国家自然科学基金(60573068 60773113); 重庆市自然科学基金(2008BA201 2008BA2041); 重庆市教育委员会科学技术研究项目(KJ090512)
  • 相关项目:数据驱动的自主式知识获取理论与方法研究
中文摘要:

样本识别是知识获取的最终应用体现,是数据挖掘研究中的一个重要内容.现有的数据挖掘算法众多,如何才能选择到一个泛化能力较强、识别率较高的最优算法成为研究的重点.文中利用粗糙集能处理不完整、不精确数据的优势,结合支持向量机、决策树方法,通过分析数据的特征,提出利用样本对规则集的覆盖度和设置一个相关阈值来进行最优分类方法的动态选择.在第一时间为样本选择到相对较优的分类算法.仿真实验验证了算法的有效性.

英文摘要:

Sample identification is the ultimate application of knowledge acquisition,is an important element of the data mining study.There have been a lot of mining algorithms,how to choose the best algorithm with strong generalization ability is now a main research point.In this paper,we make use of the advantages that rough set can handle incomplete and inaccurate data,combined with Support Vector Machines,Decision Tree methods,by analyzing the characteristics of the data,presenting using a rule union's coverage and setting a threshold to select the optimal classification method dynamically.It can find out the best algorithm at the first time.There are four steps in total.First,use rough set methods to get the rule union.Second,by analyzing the relation of sample example and rule union,putting forward uses the coverage of sample to rule union to judge whether it is suitable to use rough sets to identify the sample.The coverage reflects the number of rules that match with the sample.When the coverage is greater(or less) than 1/n,(the n here is the number of rules we get),it indicates that there are more than one rules(or no rules) match with the sample,then it may identifies the sample in error(or refuses to recognize),the sample in that case need further analysis.Third,to the samples leaved from step 2,computing the distance between it and the support vector points,when the distance is greater than a certain threshold,then it tells us that SVM can classify it well,so uses the SVM method to classify it.Forth,if the distance in step 3 is smaller than the threshold,then,uses the decision tree algorithm to identify it.In order to verify the effective of the algorithm,in the experiment part,we choose eight data sets from the UCI to test.To each data set,We select 50 percent data randomly to be train set and the other 50 percent data is used to be test set.The result shows that the algorithm in this paper has the equal well recognition rate with current optimal algorithm.The experiment results have verified

同期刊论文项目
期刊论文 63 会议论文 30 获奖 16 专利 1 著作 3
同项目期刊论文
期刊信息
  • 《南京大学学报:自然科学版》
  • 中国科技核心期刊
  • 主管单位:中华人民共和国教育部
  • 主办单位:南京大学
  • 主编:龚昌德
  • 地址:南京汉口路22号南京大学(自然科学版)编辑部
  • 邮编:210093
  • 邮箱:xbnse@netra.nju.edu.cn
  • 电话:025-83592704
  • 国际标准刊号:ISSN:0469-5097
  • 国内统一刊号:ISSN:32-1169/N
  • 邮发代号:28-25
  • 获奖情况:
  • 中国自然科学核心期刊,中国期刊方阵“双效”期刊
  • 国内外数据库收录:
  • 美国化学文摘(网络版),美国数学评论(网络版),德国数学文摘,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:9316