通过不同的聚类方式,对公共数据库中生物序列数据进行生物信息的挖掘,以达到在更广泛和更深入的框架中了解它们之间的相互关系的目的。以帕金森病相关基因所对应的mRNA序列为例,使用双序列比对的得分值作为序列之间的距离定义。同时为解决不同聚类分析之间的差异,分别采用模糊聚类和层次聚类两种不同的方法进行聚类分析。并由不同聚类方法得到的一致分类聚类的结果为基因功能分类提供支持,为进一步揭示生物序列所蕴涵的生物学知识和生物学规律提供可参考的依据。
The study aim is to use the duster method for analysis the sequences with bioinformatics. The measurement relationship of the sequences was identified and analyzed by the duster analysis. The duster analysis can yield useful information on the intrinsic characters or property of this sequences data. Two methods of hierarchical clustering and fuzzy clustering are used in this paper. The method is that duster analysis divided the data of Parkinson - Relates mRNA Sequences into groups such that similar the data objects belong to the same cluster and dissimilar the data objects to different clusters. The clustering method is based on the measure of distance. This measure is the score that respective pair wise sequence alignment. The common result of analysis by two methods of clusters is that genes are interrelated with one group, then the genes in one groups have been consisted to have close relationship and similar functions. From the cluster point of view, the results still can give support on study of genes clustering.