首先用向量空间模型工具Lucene从全部网页正文信息中检索,再用语言模型工具Lemur对结果集进行重排序,然后将两次的结果进行融合,远回融合结果的前1000篇文档作为最终结果集。构造查询输入时,从主题的〈title〉字段和〈dese〉字段选择关键词,并依据tf*idf的思想对关键词赋予权值。时正式评测的50个主题集检索,获得的三项评价指标为:程序自动构造查询时,MAP=0.3107,P@10=0.624,R-Preeision=0.3672;人工构造查询时,MAP=0.3538,P@10=0.684,R-Preelsion=0.4078。
A rough set of relevant results is returned by Lucene, which based on vector space model, after searching all web pages, and is then reranked by Lemur, a language model based tool, to form a second set of relevant results. These two sets are combined by a linear interpolation into one set afterward and the top 1000 pages in it are returned as final results. When formulating queries from topics, key words of queries.are selected from 〈 title 〉 fields and 〈 desc 〉 fileds of topics, and weights of them are calculated using a modified ff * idf method. In the official evaluation on 50 topics, MAP 0. 3107, P@ 10 0. 624, R-Precision 0. 3672 and MAP 0. 3538, P@ 100. 684, R-Precision 0. 4078 are achieved with queries constructed automatically and artificially respectively.