现有的资源描述框架(RDF)数据分布式并行推理算法大多需要启动多个MapReduce任务,但有些算法对于含有实例三元组前件的RDFS/OWL规则的推理效率低下,整体推理效率不高.针对此问题,文中提出结合Rete的RDF数据分布式并行推理算法(DRRM).首先结合RDF数据本体,构建模式三元组列表和规则标记模型.在RDFS/OWL推理阶段,结合MapReduce实现Rete算法中的alpha阶段和beta阶段.然后对推理结果进行去重处理,完成一次RDFS/OWL全部规则推理.实验表明,文中算法能高效正确地实现大规模数据的并行推理.
Most of the current distributed parallel reasoning algorithms for resource description framework (RDF) data need multiple MapReduce tasks. However, the reasoning of instances of triple antecedents under resource description framework schema (RDFS) /ontology web language (OWL) rules can not be performed expeditiously by some of these algorithms during processing massive RDF data, and the overall efficiency in reasoning process is not satisfactory. To solve this problem, a distributed parallel reasoning algorithm with Rete for RDF data on MapReduce (DRRM) is proposed to perform reasoning on distributed systems. Firstly, lists of schema triples and models for rule markup with the ontology of RDF data are built, and then alpha stage and beta stage of Rete algorithm are implemented with MapReduce at the phase of RDFS/OWL reasoning. Finally, the dereplication of reasoning results is conducted and a whole reasoning procedure of all the RDFS/OWL rules is executed. Experimental results show that the results of parallel reasoning for large-scale data can be achieved efficiently and correctly by the proposed algorithm.