为了从本质上揭示H1N1病毒分子的变异、流感流行等关系,提出一种构建HIN1病毒进化树新方法。在1902-2013年全球22455条H1N1型禽流感病毒HA蛋白质序列数据的基础上,利用其特征向量构建基于内积的HA蛋白质序列相似度。采用基于相似度的完全聚类图的方法进行数据系统粗粒化的相似信息提取。最后,利用基于模糊邻近关系的结构聚类方法进行H1N1禽流感病毒HA蛋白质序列的进化树研究,将病毒分为33大类。进一步分析表明,H1N1病毒的变异不仅与爆发时间密切相关,还与所分布地域及地域间的距离有很大关系,即分布地域间的距离越近,爆发的病毒进化的相似程度越高。对大量的病毒进行进化树分析,从宏观角度体现了各类病毒之间的进化关系。
This paper proposed a new method for constructing evolutionary tree of H1N1 flu virus in order to reveal the rela- tionship between the molecular variation of H1N1 and epidemics. First, according to the 22455 HA protein sequence data of H1N1 flu virus in 1902--2013 ,it constructed the similarity of HA protein sequences by using the eigenvectors based on inner product. Then, it extracted coarse grained similar information of data systematically by introducing complete graph clustering based on similarity. Finally,it studied the evolutionary tree for HA protein sequences of H1 N1 flu virus by using the structure clustering method based on fuzzy proximity relations, and gained 33 categories of the virus. Further analysis shows that the mu- tation of the H1 N1 virus not only is closely to outbreak time, but also relates to region and the distance between the distribution regionother. That is the closer the distance between the distribution region, the outbreak of the virus in the evolution have higher similarity degree. It provides a new method to analyze the evolutionary tree of large amounts of virus, and to show the evolution- ary relationships between all kinds of virus from the macro perspective.