计算属性约简是粗糙集框架下归纳学习的关键部分.基于差别矩阵的属性约简算法是常用的属性约简算法之一.给定一个信息系统,利用该算法可以求出信息系统的所有属性约简.但是该算法需要的存储空间大,执行时间长,特别是对于大型数据库,差别矩阵的存储成为其应用的瓶颈.针对这一问题,提出了一种基于样例选取的属性约简算法,算法分为3步:首先从样例集中挑选出重要的样例;然后用选出的样例构造差别矩阵;最后计算信息系统的所有约简.实验结果显示,当处理大型数据库时,新算法能有效地减少存储空间和执行时间.
Computing reduction of attributes plays an essential role in the framework of supervised learning based on rough sets. Attribute reduction algorithm based on discernibility matrix is one of the commonly used attribute reduction algorithms. Given an information system, all reductions can be found by using this algorithm. However, this algorithm suffers from the main problems: large memory requirement and large response time needed. Especially, for a large database, it is the bottleneck to store the discernibility matrix. To tackle this problem effectively, an attribute reduction algorithm based on instance selection is proposed. The algorithm consists of three stages: firstly, the most informative instances are selected from the training set; secondly, the discernibility matrix is computed by using the selected instances; finally, all reductions can be found. The experimental results show that the proposed method can efficiently reduce the computational complexity both of time and space especially on large databases.