为了提高网络行为聚类的准确性和有效性,提出了一种用于分析网络行为的改进K均值算法.算法首先计算K类中心的轮廓系数,以及各类数据与类中心的距离,然后自动选取优秀样本,最后求均值作为优化后的初始聚类中心重新进行聚类.在UCI数据集上的实验表明,该算法聚类时间短,提高了聚类的准确性.
An improved K-means algorithm is proposed for the analysis of network behavior in order to improve the accuracy and efficiency of clustering.The algorithm first uses the traditional K-means to calculate the silhouette coefficient in the center and the distance between various types of data and the center of the class,and then it selects automatically the best samples.Finally,the mean value is selected as the initial cluster center to recluster.The experimental results on UCI data sets show that the algorithm clusters in a short time and also improves the accuracy of clustering.