大规模RDF数据的高效查询和推理是语义Web研究中关键问题.通过研究RDFS推理规则,结合ORDPATH编码机制,提出了一种新的大规模RDF数据的索引构建方案——S-Index.S-Index的一个显著特点是,通过对RDF数据的查询即可得到RDFS蕴含知识,实现了离线推理.对RDF数据中的ABox和TBox进行区分,通过ORDPATH编码,分别构建TBox中的Class和Property子父关系,以及domain和range关系的语义信息.将构建的语义编码信息持久化到RDF三元组索引中,使得三元组索引承载有语义信息,也即构成语义索引并最终持久化到底层数据库.设计并进行了一系列实验,对比了语义索引与一般索引在存储和查询时的性能.实验结果表明,S-Index语义索引在没有给不支持语义的传统RDF索引增加显著额外负担和开销的情况下,获得了在查询时对推理的有效支持.
The efficient query and inference on large-scale RDF data are key problems in the Semantic Web research. In this paper,by studying the RDFS inference rules,we propose a novel index construction scheme of RDF data combined with ORDPATH coding schema——S-Index. The distinguished characteristic of S-Index is that implicit knowledge entailed by RDFS rules can be obtained only by query,which implements the off-line inference. ABox and TBox are distinguished in the RDF data,and we construct the semantic information which describes the Class relationship,Property relationship and the domain and range relationship using the ORDPATH code. The semantic information is encoded and persisted to the RDF SPO index,which loads the SPO index with semantic information and thus creates the semantic index. The SPO index with semantic information will be persisted into database. A series of experiments are designed and conducted to compare the data loading performance and query performance between semantic index and traditional index. The experimental results show that SIndex structure supports efficient query along with effective inference without adding obvious overheads to the traditional index.