深入研究了基于β边界阈值选取的变精度粗糙集分类问题,提出β边界阈值选取新方法。由于以往变精度阈值β人为设定,面对复杂多变的多种类型的大数据集,其应用范围有限。因此提出平均包含度的概念,将平均包含度作为选取上下近似集的阈值,能够根据不同类型的数据集生成最优变精度阈值,将边界域中信息量较大的条件属性归入正域。实验结果表明,改进后的算法下近似集增加,上近似集减小,边界区域减小。在不增加额外训练时间的前提下,与传统可变精度粗糙集(VariablePrecisionRoughSet,VPRS)相比,分类精度明显提高。
This paper deeply studies the thresholdβselection based on the variable precision rough set for classification problems and gives a new method.In the face of complex multiple types of large data sets,manually setting the thresholdβvalue limits its application.So this paper puts forward the concept of average contains degrees.Firstly,let the condition of average contains degrees as selecting threshold value of upper and lower approximation set.Secondly,it generates the optimal variable precision threshold according to different types of data sets.The domain boundary has large amount of condition attribute information in positive region.Experimental results show that the improved algorithm increases lower approximation set,decreases upper approximation set,and reduces the border area.On the premise of not adding training time,the results obviously improve the classification accuracy when compared with traditional Variable Precision Rough Set(VPRS).