属性约简是粗糙集理论的重要研究内容,已有效应用于机器学习、数据挖掘等领域.基于条件信息熵的属性约简可有效推广代数观下的属性约简,但存在抗噪声弱且某些情况下冗余属性多的不足.为此,本文在引入决策表中基于条件信息熵的近似约简概念后,提出决策表中基于条件信息熵的近似约简算法,该算法可有效增强抗噪性,且可依据实际应用的需要有效地对冗余属性进行取舍.最后,本文侧重通过选择不同精度下的约简属性子集在Bench- mark上进行了分类器的性能测试.
Attribute reduction is not only one of important parts researched in rough set theory,but also widely applied to many fields such as machine learning,data mining and so on.The attribute reduction method based on conditional information entropy can also be used effectively in the algebra view.However,these are two main disadvantages:this method is sensitive to noise and in some cases the obtained attribute subset may contain some redundant attributes.Therefore,in this paper,after introducing a concept of approximate reduction based on conditional information entropy in decision tables,we present an approximate reduction algorithm based on conditional information entropy(ARABCIE).The algorithm can effectively improve sensitivity to noise and properly select those redundant attributes by applications.Finally,we discuss the robustness of ARABCIE algorithm by experimenting on benchmark using several attribute subsets with different precision.