由于符号属性数据缺乏固有的几何特性,不能简单地将现有的数值属性数据分类算法应用于符号属性数据.为了提高符号属性数据的性能,提出一种基于关联关系分析的支持向量机分类方法(Support Vector Machine Classification Approach Based on Correlation Analysis,CA_SVM).通过分析属性值与标签之间的相关性,得到属性值对标签的影响因子;然后结合属性值在类内出现的频率,使得所有原始符号数据下的属性值在不失信息的情况下转换成数值型数据;转换后的数据既可以体现属性值与标签之间的关联关系,也可以有效地表示相同属性下属性值之间的距离;最后用支持向量机(Support Vector Machine,SVM)进行分类.在标准UCI数据集上的实验结果表明,CA_SVM模型能够提高分类精度.
Due to lack of geometric property between categorical data,the current classification algorithms for numerical data fail to deal with categorical data.To effectively improve the classifying performance in a set of categorical objects,we proposed a support vector machine classification approach based on correlation analysis,namely CA_SVM.By analyzing the correlation between attribute values and labels and the frequency of attributes in the class,we get the influence factors of attribute values on label.The approach,which can not only reflect the correlation between attribute values and labels,but also effectively expresses the distance between attribute values,may transform a set of categorical data into numerical data without losing information.The classifying performance of new proposed method was tested on data sets downloaded from the UCI.Results illustrate that the new proposed CA_SVM model increases the classifying accuracy.