针对常用聚类算法对复杂分布数据难以有效聚类的问题,把网络分析技术与基于代价函数最优的聚类技术相结合,提出一种新颖的迭代可调节网络聚类算法。该算法采用网络的思想建立样本空间模型,把数据聚类问题转化为基于节点生长连接的网络分析问题;并设计了可调节的节点间相似关系测度和相应的聚类准则来构建节点间邻域搜索及节点生长操作;通过改变调节系数来实现网络节点间连接关系的整体调节。新算法能够在无需预先设定簇数目的情况下,自动获得簇的数目和样本数据的分布位置。采用4组不同样本分布的人工数据集聚类和往复压缩机气阀泄漏故障诊断试验,对比测试了新算法与K均值算法(KM)的性能,结果表明迭代可调节网络聚类算法可实现对复杂分布的流形数据聚类,在准确率及自动处理程度性能指标上明显优于常用的KM算法。
Aiming at the problem that conventional clustering algorithm is invalid for data with complex shape clusters. In this study,a novel clustering algorithm is put forward by combining network analysis with the clustering based on function optimization.Network approach is employed to make the model of data samples,so the data clustering is transformed into network analysis.An adjustable similarity measure and corresponding clustering criterion function are designed to construct the neighborhood searching and node spanning operations.Additionally,the relations between nodes can be completely adjusted by adjusting the parameters.The algorithm can solve the number and location of the clusters jointly without presetting the number of the clusters ahead.Experimental results on four artificial datasets with different manifold structures and fault diagnosis of reciprocating compressor valve leakage show that the new algorithm not only can accomplish the clustering for data lying in a manifold,but alto can achieve better performance of accuracy and automaticity than those of the K-means algorithm.