科学工作流的数据源视图根据数据源中任务间的数据流关系。将它们划分为多个复合模块.并在此基础上进行数据抽象与封装,从而可有效降低科研工作者的数据分析工作量并节省数据查询时间。然而在云计算环境中开发与应用科学工作流系统时,由于受数据采集的准确度和服务器的可靠性影响,将会导致工作流数据源图的不确定性,因此需要提供有效的机制在不确定数据源图中构建合理性视图。针对此方面,首先给出了不确定数据源图及其合理性视图的定义.在此基础上提出了一种检测不合理视图的方法:还进一步分析了数据源图中任务节点与其一阶前序节点之间存在的多种数据流关系及复合任务的局部期望支持度。给出了合理视图的构造方法。设计了相应的多项式时间算法,并分析算法的时间复杂度。最后,对相关方法给出示例,并进行实验分析.验证了其可行性与有效性。
The view of data provenance in scientific workflow provides an approach of data abstraction and encapsulation by partitioning tasks in the data provenance graph (DPG) into a set of composite modules due to the data flow relations among them, so as to efficiently decrease the workload consumed by researchers making analysis on the data provenance and the time needed in doing data querying. Nevertheless, developing and applying the scientific workflow systems in cloud computing environments suffers the problem of uncertainty brought by the inaccuracy of data collection and unreliability of data servers distributed in the internet. Concentrating on this scenario, the definitions of uncertain DPG and its sound view were presented firstly, and then a method for detecting the unsound view of DPG was proposed. Also, a method for constructing sound and high-support view was presented, which is based on the data flow relations among the tasks and their first-order preceding tasks in the graph, and the local expected support of the composite modules. A polynomial-time algorithm was designed, and its maximal time complexity was also analyzed. Additionally, an example and conduct comprehensive experiments were given to show the feasibility and effectiveness of the method.