为了解决大型XML文档检索时间长、响应速度慢、内存资源消耗大等问题,设计了类B树形结构的双索引结构,提出了基于双索引结构快速定位目标内容的查询方法。采用基于路径的倒排索引结构,降低了检索内容之间逐个比较Dewey编码的时间消耗。同时针对XML文档内容进行分词处理构建数据单元,通过数据单元间的逻辑关系建立PathGuide索引库,避免对查询内容无关节点的访问。多组对比实验结果表明,基于内容的双索引结构查询方法及优化方案在查询效率上表现出明显的优越性。
In order to solve problems about large XML documents, such as time-consuming retrieval, slow response speed and excessive resource consumption, the dual index structure based on B tree is designed, and a query method based on dual index structure is proposed to quickly locate the target content. The inverted index structure based on the path is adopted for reducing effectively time consumption of the content retrieval by comparing the Dewey encoding. At the same time, for XML document contents, the data units are constructed by the process of word segmentation, and the PathGuide index data- base is established through the logical relationship between the data units. The index database can effectively avoid the meaningless access to the irrelevant nodes of the query content. Through multiple sets of comparative experiments, the re- sults indicate that the proposed method and the optimization solution show obvious superiority in the query efficiency.