针对字符型数据和混合型数据的聚类方法进行了研究。首先在经典粗糙集理论的基础上,通过松弛对象之间的不可分辨和相容性条件,得到了基于和谐关系的扩展粗糙集模型;然后定义了新的个体间不可区分度、类间不可区分度、聚类结果的综合近似精度等概念,提出了新的混合数据类型层次聚类算法。该算法不仅能处理数值型数据,而且能处理大多数聚类算法不能处理的字符型数据和混合型数据。实验验证了算法的可行性。
This paper presented a new clustering method which could deal with mixed type data. Firstly, proposed an extended rough sets model based on concordance relation which relaxed the indiscernibility relation and tolerance relation. Secondly, redefined some concepts, such as the indiscernibility degree between two objects, the indiseernibility degree between two clus- ters, integrated approximation rate of the clustering result. Then, proposed a new hierarchical clustering algorithm to deal with mixed data. The algorithm not only could deal with the numerical type data as the other algorithms, but also could deal with the character type data and mixed type data. The experiment shows the method is feasible.