提出一种针对大规模RDF(Resource Description Framework)数据的高效而又准确的查询方法,此方法基于图聚类算法.首先利用已被证明在处理大规模图数据时效果最好的图聚类算法对大规模RDF数据进行划分,得到一个划分结果.这个划分结果满足,划分子集内部连接非常紧密而划分子集之间连接非常稀疏.然后根据RDF查询请求对划分结果进行特定的筛选,在筛选所得的RDF数据子集上执行查询操作,从而节省大量查询响应时间,提高查询效率.我们实现了这一查询方法,并选取几个具有代表性的大规模RDF数据集进行了性能实验.实验证明,相比单纯运用目前效率最高的RDF-3X查询引擎进行查询的方法,本文提出的方法在保证较高查全率和查准率的前提下,能够大大提高查询效率.
In this paper, we present a large-scale RDF data query method which is based on graph clustering algorithm. Firstly, we take good advantage of the existing best performing graph clustering algorithm which can tackle with very large scale graph to partition the large-scale RDF dataset. After the partition,there are a great deal of edges within each subset and relatively few between the subsets. Then we filter the subsets according to the RDF query request,and execute RDF query on the rest RDF subsets. As a result,the query response time is largely saved and the query efficiency is improved. We successfully implement this algorithm and evaluate its per- formance by applying it to several representative large-scale RDF datasets. The implementing results indicate that compared with the method of simply using the most efficient RDF query engine, our proposed method can greatly improve query efficiency under the premise of high recall ratio and precision ratio.