近些年来,在处理由海底观测网站收集到的庞大观测数据的研究中,需引入新的科学工具来支持所需的高性能分布式计算环境。而科学工作流在先进信息基础设施研究中得到广泛重视,成为未来科研环境的具体实现工具。针对这一问题,提出了基于Kepler科学工作流的海量海底观测数据处理解决方案,并且研究了系统调用Hadoop集群进行海量数据处理的两种方式及其优缺点;通过实验,对比分析了该两种方式与传统Java编程模式调用Hadoop集群进行数据处理的效率问题,证明了Kepler调用集群的高效性。
In recent years,faced with the problem of processing massive observing data collected by the seafloor observatory networks,new scientific tools are needed to be introduced to support the high-performance and distributed computing environment. Scientific workflow has been widely attached great importance to research advanced information infrastructure,and it has become a concrete realization tool for the future research environment. To solve this problem,this paper puts forward a new solution for processing massive seafloor observing data based on the Kepler scientific workflow,and studies the advantages and drawbacks of the two methods applying for massive data processing with the use of Hadoop clusters. Compared with the traditional Java programming mode,the experiment results prove that the efficiency of the two methods using Hadoop cluster is higher,and the Kepler scientific workflow will result in high efficiency.