在生物和医学研究中采用基因矩阵技术为癌症诊断和治疗提供了一条新思路,为了发现不同癌症类型和准确地对癌症样本进行分类,提出了基于神经气体(NG)算法的聚类集成算法框架双神经气体聚类集成(DNGCE)去挖掘含有噪音的癌症基因数据集的内在结构。该算法框架不仅把神经气体算法应用在癌症基因数据集的样本维,同时也应用于属性维中,最后使用Normalized Cut算法去划分前面得到的多种不同聚类结果组成的一致性矩阵,最终得到更加准确的聚类结果。通过应用在癌症基因数据集的实验结果表明,提出的聚类集成算法框架对于癌症基因数据集的聚类效果要远胜过单一的聚类算法和现阶段存在的大多数的聚类方法,可以极大提高癌症诊断的准确度。
The microarray technology used in biological and medical research provides a new idea for the diagnosis and treatment of cancer.To find different types of cancer and to classify the cancer samples accurately,we propose a new cluster ensemble framework Dual Neural Gas Cluster Ensemble(DNGCE),which is based on neural gas algorithm,to discover the underlying structure of noisy cancer gene expression profiles.This framework DNGCE applies the neural gas algorithm to perform clustering not only on the sample dimension,but also on the attribute dimension.It also adopts the normalized cut algorithm to partition off the consensus matrix constructed from multiple clustering solutions.We obtained the final accurate results.Experiments on cancer gene expression profiles illustrated that the proposed approach could achieve good performance,as it outperforms the single clustering algorithms and most of the existing approaches in the process of clustering gene expression profiles.