目前,基于计算机数学方法对基因的功能注释已成为热点及挑战,其中以机器学习方法应用最为广泛。生物信息学家不断提出有效、快速、准确的机器学习方法用于基因功能的注释,极大促进了生物医学的发展。本文就关于机器学习方法在基因功能注释的应用与进展作一综述。主要介绍几种常用的方法,包括支持向量机、k近邻算法、决策树、随机森林、神经网络、马尔科夫随机场、logistic回归、聚类算法和贝叶斯分类器,并对目前机器学习方法应用于基因功能注释时如何选择数据源、如何改进算法以及如何提高预测性能上进行讨论。
In recent years, it is very popular to annotate gene functions with the methods of computation, mathematics and statistics, among which the machine learning method is widely used. A lot of researchers are proposing faster, more effective and more accurate machine learning methods for gene functional annotation, which promote the development of biology and medicine. In this review, we provides an overview about machine learning methods in gene functional annotation including support vector machine, k-nearest-neighbour, decision tree, random forests, neural network, Markov random field, logistic regression, clustering algorithms and Bayes classifier. Besides, we also summarized and discussed the ways to select the data source, and to improve algorithms and increase the prediction performance.