数据集的质量会极大地影响分类算法的精度,针对一类隐式互斥的数值型数据提出了一致性分类方法.借鉴连续函数的思想,提出了数值型连续数据的分类一致性定义;改进了SOM算法的计算过程,使其满足文中提出的分类一致性最优条件.通过改进的SOM方法得到一个新的聚类数据集,减少了原始数据集中容易出现的隐式分类不一致性问题,从而有效地提高了分类方法的效率和分类精度.通过在一个实际的数据集上的比较,表明提出的算法的预测精度明显优于其他算法.进而还从VC维的角度分析了提出算法的优点.
Data quality greatly affects the precision of classification methods. In this paper, we present an efficient c.onsistent classification algorithm based on continuous SOM clustering for inherent conflicting numerical data. We propose an improved SOM algorithm in order to satisfy the consistent classification optimization condition. The resulting clustered dataset from the improved SOM algorithm alleviates inherent inconsistency in original datasets. The presented method improves the performance of the classification in both efficiency and precision. The experimental results on a real-world dataset show that the proposed approach goes more effectively than the baseline algorithms in precision. In addition, the method is analyzed using VC dimension.