动脉粥样硬化是因脂质堆积在血管壁上并受到多种遗传和环境因素影响的一种复杂的病理生理疾病。动脉粥样硬化风险疾病基因的辨识可以增进对该疾病机理的了解,并对该疾病的诊断和治疗起到指导性作用。虽然在风险疾病基因的辨识方面已经提出了很多计算方法,但仍存在着推论准确性和计算效率方面的问题。一种命名为基于熵聚类和双重筛选(Entropy-based clustering and double screening, ECDS)的新方法被用来辨识该疾病的风险疾病基因。该方法将功能基因组信息和蛋白质相互作用网络拓扑结构信息进行整合,运用于基于熵聚类的方法中,之后,使用双重筛选策略(即支持向量机和相似性得分)进行风险疾病基因挖掘。运用该方法,从巨噬细胞样本和泡沫细胞样本中分别辨识出79个和113个风险疾病基因。该结果表明ECDS在辨识动脉粥样硬化风险疾病基因方面非常有效。此外,该方法也很易于扩展应用到其它复杂疾病的风险基因辨识中。
Atherosclerosis (AS) is a complex pathophysiologic disease characterized by lipid accumulation in the vascular wall and regulated by multiple genetic and environmental factors. The identification of atherosclerotic risk disease genes can broaden the understanding of the mechanism of AS, and guide for disease diagnosis and treatments. Although many computational approaches have been proposed for identifying the risk disease genes, a major challenge is the balance between inference accuracy and computational efficiency. A novel method of entropy-based clustering and double screening, named as ECDS, was introduced to identify the atherosclerotic risk disease genes in this work. In this algorithm, the functional genomic information and the topological structure information of protein-protein interaction network was integrated with the entropy-based clustering method, then the double screening strategy (that is, support vector machine and similarity score) was employed to predict the atherosclerosis risk disease genes. ECDS can identify 79 risk disease genes from macrophages samples, and 113 from foam cell samples, respectively. These risk genes have similar biological functions and signaling pathways with known AS disease genes. The results show that ECDS is very effective for identifying the atherosclerotic risk disease genes. In addition, ECDS is easy to be extended and applied to recognize other complex disease genes.