基因芯片技术为疾病异质性研究提供了有力的工具。当前基于传统聚类分析的方法一般利用芯片上大量基因作为特征来发现疾病的亚型,因此它们没有考虑到特征中包含的大量无关基因会掩盖有意义的疾病样本的分割。为了避免这个缺点,提出了基于耦合双向聚类的异质性分析方法(Heterogeneous Analysis Based on Coupled Two-Way Clustering.HCTWC)来搜索有意义的基因簇以便发现样本的内在分割。该方法被应用于弥漫性大B细胞淋巴瘤(diffuse large B-cell lymphoma DLBCL)芯片数据集,通过识别的基因簇作为特征对DLBCL样本聚类发现生存期分别为55%和25%的两类DLBCL亚型(P〈0.05),因此,HCTWC方法在解决疾病异质性是有效的。
Microarray technology has proposed a powerful tool in dealling with the heterogeneity of disease. Currently, many methods in the field are based on traditional hierarchical clustering to discover subtypes of disease using a large number of genes on microarray.However, they did not considered that large unrelated noise (genes)may mask significant partitions and correlations of disease samples. To avoid the shortcoming, this paper presented a heterogeneous analysis based on coupled two-way clustering (HCTWC) to search interesting gene signature and find the natural partitions of disease samples. The method was applied to diffuse large B-cell lymphoma (DLBCL) microarray dataset. By identifying significant gene signature, we were able to discover the two new subtypes of DLBCL with survival rate 55% and 25% respectively. The results showed that HCTWC had the potential to be a powerful tool for solving the heterogeneity of disease on gene expression profile.