XML文档包含有内容和结构,除了可以进行纯内容(CO)检索外,还可以进行内容和结构(CAS)检索.提出了一种新的CAS检索方法,这种方法以内容检索为主,结构匹配为辅,结构约束主要影响结点的计分,而不是答案结点的选择.这种方法分3步进行:首先,一个CAS查询被分解为若干个查询片段;然后处理每个查询片段;最后,将每个查询片段得到的部分查询结果综合起来,得到最终的查询结果.提出了一种新的计分方案,它首先计算一个查询结果在每个查询片段上的得分,然后将这些得分总和起来得到最终得分.提出的计分方法根据检索结果内容和结构两方面的相关性计分,更符合用户查询意图和查询语义.大量的实验结果验证了提出方法的有效性.
XML documents involve both contents and structures,and can be retrieved by means of not only content-only (CO) but also content-and-structure (CAS) queries. In this paper,a novel approach for CAS retrieval is proposed. The approach proceeds in three steps: it first decomposes a CAS query into a set of query fragments,and then processes each query fragment. Finally,it combines results on each query fragments. By this approach,on the one hand,the adverse effects of structural vagueness on answer nodes selection can be removed; on the other hand,the effect of structural constraints on scoring is incorporated properly. The features of this approach make it applicable in versatile homogeneous and heterogeneous data environments. To measure the relevance query results to a given CAS query,a novel scoring scheme is presented. In accordance with the query processing approach,the scoring method first computes the scores of a query result with respect to each query fragment,and then combines these partial scores to arrive at an overall score. The proposed scoring method considers the relevance of both contents and structures in the retrieval results,and thus reflects the user's query intention and conforms to query semantics. Comprehensive experimental studies demonstrate the effectiveness of the proposed methods.