摘要:根据基因表达数据的特点,提出一种高精度的基于密度的聚类算法DENGENE。DENGENE通过定义一致性检测和引进峰点改进搜索方向,使得算法能够更好地处理基因表达数据。为了评价算法的性能,选取了两组广为使用的测试数据,即啤酒酵母基因表达数据集对算法来进行测试。实验结果表明,与基于模型的五种算法、CAST算法、K-均值聚类等相比,DENGENE在滤除噪声和聚类精度方面取得了显著的改善。
According to the characteristics of gene expression data, a high accurate density-based clustering algorithm called DENGENE was proposed. DENGENE achieves good clustering by defining homogeneity test and peak points. To evaluate the performance of DENGENE, two budding yeast Saccharomyces cerevisiae data sets, which are widely used as test data sets, were used to validate the effectiveness of DENGENE. The experiment results show that compared with five model-based clustering algorithms, CAST and K-means clustering, DENGENE filters noises effectively and produces more accurate clustering resuits.