在建立关键词倒排索引和路径索引的基础上,提出一个利用量化均衡规则和等距规则的启发式查询算法,并按照查询结果的大小排序返回最相关的前k个结果.通过建模RDF数据为RDF句子图,将文本信息封装到句子节点,同时将查询结果建模为包括所有查询关键词并且叶节点是关键词节点的无根树,将关键词查询问题转化为斯坦纳树问题.假设RDF句子图包括n个节点,最坏情况下索引占用的空间是3n^2.假设关键词节点数为k,查询算法的时间复杂度为O(kn).该方法不需要依赖RDF数据的模式信息,支持对数据中的属性和关系名进行关键词查询.实验证明该方法能够快速而有效地实现RDF数据的关键词查询.
Based on the Keyword inverted-list index and the path index,a heuristic searching algorithm is proposed.The algorithm uses the cost-balanced strategy and the equi-distance strategy to find the top-k answers.Resource description framework(RDF) data is modeled as an RDF sentence graph,and all text information is encapsulated by the sentence nodes.An answer to a keyword query is an RDF sentence tree which contains all the keywords,and all the leaf nodes are relevant to keywords.Therefore,to find a shortest answer tree is a Steiner tree problem.Supposing that there are n nodes in RDF sentence graph,the index space would be 3n^2 in the worst case.Supposing that there are k relevant nodes,the time complexity would be O(kn).The proposed approach supports keywords that match attributes and relation contained in the data,without the information of the RDF data schema.The experimental results show that the approach is feasible and effective.