位置:成果数据库 > 期刊 > 期刊详情页
微阵列表达谱监督聚类方法的比较研究
  • ISSN号:0578-1752
  • 期刊名称:《中国农业科学》
  • 时间:0
  • 分类:TP311.13[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]扬州大学江苏省遗传生理重点实验室,扬州225009
  • 相关基金:国家自然科学基金项目(30370758)和教育部“新世纪优秀人才支持计划”项目.
中文摘要:

【目的】比较不同监督聚类方法的优劣及其适用场合。【方法】应用2种高斯混合模型聚类法(GMM)、K-最近邻居法(KNN)、二分类支持向量机器法(SVMs)以及5种多分类支持向量机器法(MC-SVMs),分别对计算机模拟数据以及两组实际微阵列数据进行聚类分析,采用假阳性(FP)、假阴性(FN)、聚类的准确性以及马修斯相关系数(MCC)等指标进行评价。【结果】(1)对成千上万基因表达谱数据,在服从高斯分布条件下,2种GMM法聚类准确性最高,且在训练样本容量较小的情况下,GMM-II法聚类准确性优于GMM-I法。(2)相比较而言,多分类MC-SVMs法稳健性较高,适用性最广,其对高维数据不敏感。不仅适用于成千上万基因表达谱数据的聚类,而且适用于以成千上万基因作为指标对少数几十个样本的聚类。(3)几种MC-SVMs法的表现,在样本容量较大时,宜采用OVO和DAGSVM法;样本容量较小时,OVR、WW和CS法聚类准确性和MCC值较高;样本容量适中时,5种MC-SVMs表现一致。【结论】建议根据数据的特征以及试验需要,同时选用至少两种方法进行试算,以便获得最佳聚类结果。

英文摘要:

[Objective] The aim of the study is to study the gene supervised clustering method for DNA micoarray expression data. [Method] Several typical supervised clustering methods, Gaussian mixture model-based supervised clustering (GMM), K-Nearest-Neighbor (KNN), binary support vector machines (SVMs) and multicategory support vector machines (MC-SVMs), were employed to classify the computer simulation data and two real datasets. False positive, false negative, true positive, true negative, clustering accuracy and Matthews' correlation coefficient (MCC) were compared among these methods. [Result] (1) Classification of thousands of gene expression data, the performances of two GMM methods have the maximal clustering accuracy and the least overall FP+FN error numbers based on the assumption that the whole set of microarray data is a finite mixture of multivariate Gaussian distributions. Furthermore, when the number of training sample is very small, the clustering accuracy of GMM Ⅱ method have superiority over GMM Ⅰ method. (2) In general, the superior classification performance of the MC-SVMs is more robust and more practical, which is less sensitive to the curse of dimensionality and not only next to GMM method in clustering accuracy to thousands of gene expression data, but also more robustness to a small number of high-dimensional gene expression samples than other techniques. (3)Among MC-SVMs, in case of large sample sizes, OVO and DAGSVM perform better; In case of moderate sample sizes, five MC-SVMs methods perform very similar; Otherwise, OVR, WW and CS yield the better results when sample sizes are small. [Conclusion] A suggestion for the supervised clustering microarray data is that one should consider the data feature and experiment when choose an appropriate method. Two kinds of these methods should be trial calculation to obtain better clustering result.

同期刊论文项目
同项目期刊论文
期刊信息
  • 《中国农业科学》
  • 中国科技核心期刊
  • 主管单位:中华人民共和国农业部
  • 主办单位:中国农业科学院 中国农学会
  • 主编:万建民
  • 地址:北京中关村南大街12号中国农业科学院图书馆楼4101-4103室
  • 邮编:100081
  • 邮箱:zgnykx@caas.cn
  • 电话:010-82109808 82106279
  • 国际标准刊号:ISSN:0578-1752
  • 国内统一刊号:ISSN:11-1328/S
  • 邮发代号:2-138
  • 获奖情况:
  • 中国期刊方阵“双高”期刊,第三届中国出版政府奖提名奖
  • 国内外数据库收录:
  • 美国化学文摘(网络版),英国农业与生物科学研究中心文摘,波兰哥白尼索引,英国动物学记录,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),英国食品科技文摘,中国北大核心期刊(2000版)
  • 被引量:85620