针对基于基因表达式编程的自动聚类算法聚类速度较慢和聚簇质量较差的问题,提出一种新的并行自动聚类算法:基于统一计算设备架构和粗粒度并行模型的基因表达式编程自动聚类算法(CGC-Cluster).采用GRCM方法对基于基因表达式编程的自动聚类算法(GEP-Cluster)中聚类中心的筛选与聚合步骤进行了改进,基于统一计算设备架构以达到提高并行处理能力,基于粗粒度并行演化模型以提高并行度.选择了较知名的数据集,从算法的聚类速度和聚类质量两方面与GEP-Cluster算法进行了统计实验对比分析,实验结果表明,CGC-Cluster算法不仅获得了3倍左右的加速比,而且从簇内方差、Ocq指标和Dunn指标三种评判质量的指标进行比较,CGC-Cluster显著地改进了聚簇的质量.最后还通过实验分析了算法参数对并行算法的影响。
Aiming at the problem that the speed is slower and the quality is poor in automatic clustering algorithm based on gene ex- pression programming, a new parallel automatic clustering algorithm, named GEP-Cluster algorithm based on computer unified device architecture ( CUDA ) and coarse grain parallel evolutionary model, (CGC-Cluster) is proposed. The parallel processing capacity is im- proved based on CUDA. The parallel degree is improved based on parallel evolutionary model. The speed and quality of clustering al- gorithm are compared between CGC-Cluster algorithm and GEP-Cluster by statistical experiments. Experimental results show that CGC-Cluster algorithm not only obtains three times speedup but also gets better clustering qualities from three indexes that variance of intra-cluster,Ocq index and Dunn index. Finally,the effect of algorithm parameters on the parallel algorithm is analyzed through the experimental.