大规模并行作业的计算通常涉及海量的计算数据和众多的高性能计算设备.随着网格计算技术帮助人们进行计算的同时。大规模并行作业的数据规模的增长也越来越快,对计算速度的要求也越来越高.为了充分利用网格等计算平台上的资源,提高作业的计算效率,人们通常需要将待计算的数据进行分组,然后分别上传至不同的平台上进行计算,这对科学研究和数据管理造成了极大的不便.本文提出了一个针对大规模并行作业计算的统一数据管理空间,实现了异构网格和计算平台上数据的逻辑整合,从而大大提高了对计算数据的管理效率,加快了科学活动的进程.本文最后通过统一数据管理空间在大规模虚拟筛选中的应用,对该统一空间的数据传输效率和数据管理能力进行了分析.
Large scale task parallel computing usually involves a large amount of data that cart be individually scheduled on different computing resources. With the development of the grid technology in scientific computing, the dataset of large scale task parallel computing has increased more and more rapidly. And scientists usually divide the large scale dataset into small groups and upload them to different grid platforms to reduce the time for the whole task. This paper proposes a unified data management system for large scale task parallel computing which logically integrates the data space in heterogeneous grid platforms. The system also provides a transparent interface for all heterogeneous data spaces. And application of the system to large scale virtual screening shows that the system has the high efficiency of data transfer and management.