在数据挖掘领域,聚类是对数据初始的处理。动态系统中,由于经常要增加一些新的数据,如果每次对新增的数据都重新聚类,这样就既浪费时间又浪费资源。首先介绍了聚类的基本概念和聚类的分类,在此基础上提出的一种基于特征向量的聚类算法,它只对新增的数据聚类,这样就会节省大量的资源和时间。通过实验,在动态系统中对新增的数据用该增量聚类算法和重新聚类的算法相比较,最后得出结论,该增量聚类算法是可行的。
Clustering is the initial processing of data in the fields of data mining. Due to the frequent need to add some new data into the dynamic system,it will be a waste of time and resources if the added data is to be reclustered each time. This text introduces the basic concept of clustering and its classification. Then a clustering algorithm based on feature vectors is given,in which only the new data is reclustered and a lot of time and resources can be saved. A comparison between the incremental clustering algorithm and reclustering algorithm in data processing of dynamic system was made through experiments. The final conclusion proved that the incremental clustering algorithm is feasible.