传统的聚类算法不适用于处理海量和高维数据。针对云计算环境下,利用集群系统的并行计算能力,实现海量数据的聚类问题,给出了云计算环境下基于分形维数的聚类融合算法。该算法首先对基于分形维数的聚类算法进行改进,使之更适用于并行计算,其产生聚类作为初始聚类成员;再结合投票算法的融合策略实现融合。最后,对基于分形维数的聚类融合算法在云计算环境下实现并行计算。通过在UCI数据集上的对比实验来验证该算法的有效性。
The traditional clustering algorithms are not fit for dealing with mass and high dimensional data in practical application. In view of the cloud computing environment, to use cluster system parallel computing ability, to realize mass data clustering problems, a fractal dimension clustering ensemble algorithm based on cloud computing environment is proposed in this paper. Firstly, a cluster algorithm based on fractal which results as the initial clustering members is improved,and it is more suitable for parallel computing. Then, the clustering members are integrated by using the voting algorithm.At last, the proposed algorithm in cloud computing environment is realized parallel computing. The experimental results on UCI data set verify the validity of the proposed algorithm.