由于半结构文档如XML越来越广泛的应用,在数据库和信息检索领域,对半结构XML数据相似度的研究也变得尤为重要。给定XML文档集D和用户查询q,XML检索即是从D中查找出符合q的XML文档。为了有效地进行XML信息检索,提出了一种新的计算用户查询与XML文档之间相似度的算法。该算法分为三步:基于WordNet对用户查询q进行同义词扩展得到q’;将q’和D中的每一篇XML文档都进行数字签名,并通过签名之间的匹配对D进行有效过滤,除去大量不符合用户查询的文档,得到一个文档子集D’,D’∈D;对q'与D'中的文档进行精确匹配得到检索结果。
With more and more application of semi-structure data, the research of XML document similarity becomes essential in the database and information retrieval communities. Given set of XML documents D and the user query q, XML retrieval is to find out the XML documents from the D which satisfies q. In order to search efficiently, a new approach is presented to calculate similarity between two XML documents. The approach is divided into three steps. The user's query q is expanded to q' by includ- ing the synonyms of q based on WordNet. q' and each XML document in D are allocated to digital signatures. After eliminating the irrelevant documents in D according to the signatures matching, a subset D' of D is got. Precise matching between q and D' is presented and final results are got.