位置:成果数据库 > 期刊 > 期刊详情页
Hierarchical covering algorithm
  • ISSN号:1007-0214
  • 期刊名称:Tsinghua Science and Technology
  • 时间:2014
  • 页码:76-81
  • 分类:TP393[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术] TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
  • 作者机构:[1]Department of Computer Science and Technology and Key Lab of Intelligent Computing and Signal Processing,Anhui University Hefei 230601, China
  • 相关基金:the National Natural Science Foundation of China (Nos. 61073117 and 61175046); the Provincial Natural Science Research Program of Higher Education Institutions of Anhui Province (No. KJ2013A016); the Academic Innovative Research Projects of Anhui University Graduate Students (No. 10117700183); the 211 Project of Anhui University
  • 相关项目:商空间链的表示与海量信息的问题求解方法研究
中文摘要:

Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.

英文摘要:

Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.

同期刊论文项目
期刊论文 37 会议论文 14 著作 1
同项目期刊论文
期刊信息
  • 《清华大学学报:自然科学英文版》
  • 主管单位:教育部
  • 主办单位:清华大学
  • 主编:孙家广
  • 地址:北京市海淀区清华园
  • 邮编:100084
  • 邮箱:journal@tsinghua.edu.cn
  • 电话:010-62788108 62792994
  • 国际标准刊号:ISSN:1007-0214
  • 国内统一刊号:ISSN:11-3745/N
  • 邮发代号:82-627
  • 获奖情况:
  • 国内外数据库收录:
  • 美国化学文摘(网络版),美国数学评论(网络版),德国数学文摘,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘
  • 被引量:323