针对聚类分析在处理任意形状、任意密度和具有一定结构特征的数据集时存在的不足,首先在数据空间中建立离散拓扑流形,通过在此结构上定义邻域密度相似性和邻域密度变化光滑性两个相对性度量标准,并利用可达性给出样本结构相似性和类结构的定义,证明类结构关系是一个等价关系.然后将结构相似性当作吸引力,设计基于压缩变换的聚类方法,该方法具备处理任意形状、任意密度和解释性好等许多优点.最后在人工数据集和标准数据集上的比较实验结果表明,该方法在聚类效率和有效性上都明显优于其它聚类算法.
The current clustering methods are difficult to handle the complicated problems in which shapes and densities are changing along with the data. To overcome the shortcomings of existing clustering methods, based on discrete topological manifold created in the data space, the structural similarity of samples and the class structure are described by accessibility after defining two new relativity metrics: the neighborhood density similarity and the smoothness of neighborhood density changes. The class structure relationship is proved to an equivalence relation. Then, a clustering algorithm is designed based on compressive transformation by treating the structural similarity defined on samples as the attractiveness. The algorithm is designed to handle data with any shapes and any density, maintaining good interpretability and many other advantages. Experimental result on the artificial data sets and standarddata sets shows that the method is superior to the state-of-the-art methods.