在信息检索中,关键词之间的距离反映了其对信息描述的集中程度,并对检索结果与用户需求的相关度产生影响。通过对XML数据固有的结构信息进行深入分析,确定了被检索数据的信息对象和信息分支的概念,并考虑以上因素建立语义距离模型。通过该模型的计算,能够更准确地计算出查询结果的相关度。实验结果证明:在真实数据集上,查询质量方面优于现有的主流算法(EASE、SLCA),同时具有较高的查询效率。
Keywords proxim ity reflects the degree of information concentration and affects the relevance between search results and information needs.W ith deeply analyzing the inherent structure information of XML data,the concepts of information object and information branch are proposed.Then the keyword proxim ity model based on these concepts is established.W ith the model,the semantic d istance of keywords can be measure for computing more relevant results.Extensive experiments on real datasets demonstrate effectiveness and efficiency of the pro-posed approach.