各大搜索引擎公司都致力于准确而快速的帮助用户找到信息目标,搜索性能评价变得非常重要,而目前尚无对长尾查询性能评价的方法.该文通过分析长尾查询结果数据,提取了长尾查询三种类型特征,并对特征进行叠加分析.进一步地针对数据集的严重不平衡问题提出两种数据平衡方法.最后提出并改进了长尾查询评价方法.在真实搜索引擎结果数据集上的实验验证了所提出的评价方法取得一定的评价效果,其中对不相关文档的评价取得较高的准确率.
Search engines are committed to helping people find target information accurately and quickly, hence the e- valuation of search performance becomes more vital, This paper deals with the rare queries performance evaluation which is less touched. First, three types of features are extracted after analyses of rare queries characteristics. Sec- ond, correlation of the fealures is analyzed and different combinations of features are tested. Then, two data balan- cing approaches are raised to alleviate the serious imbalance of the data set. Finally the evaluation method for rare queries is put forward and then improved. The experimental results show that the proposed evaluation approach is effective, by which the identification of non-relevant results achieves encouraging precision.