针对现有的快速方差分析算法进行并行可扩展性改进,设计一种高效的并行计算模型,并提出一种基于MapReduce模型的基因-基因相互作用识别算法MR-ANOVA算法.该算法有效解决了现有基因-基因相互作用识别算法在海量数据规模下普遍存在计算复杂度过高的问题.实验结果表明,该算法充分利用了云平台的并行计算能力,随着数据量的增大,加速比逐渐接近于集群数量,可高效准确地完成基因-基因相互作用的识别.
The authors proposed an optimized algorithm for detecting gene-gene interactions based on MapReduce model,namely,MR-ANOVA.Compared with the traditional FastANOVA algorithm, this algorithm puts forward the concept of parallel processing during which an efficient parallel computing model is used.This improvement can make the problem of high computational complexities with the large-scale data of the existing algorithms solved.Analyzing results of the experiment,we can draw the following conclusion:MR-ANOVA algorithm can make the best use of the promising power of parallelism computation of the cloud platform.As the scale of the data becomes larger,the speedup is more close to the number of clusters.Thus,this optimized algorithm can detect epistatic interaction more efficiently.