随着WWW的迅速发展,Web信息检索技术成为研究者广泛关注的话题,但缺少合适的测试评测机制制约了中文网页信息检索技术的发展。参考国外测试集的构建经验,我们构建了大规模中文网页信息检索测试集CWT,并组织了SEWM中文网页检索评测,希望在国内外各个研究小组的共同参与下建立并完善CWT,一起推动中文网页信息检索技术的发展。本文在调研和分析国内外现有研究进展的基础上,详细介绍了CWT的构建原则和方法,并对CWT进行了有效的统计分析和实验研究。本文提出的构建测试集的方法为以后的研究提供了参考。
With the rapid development of World Wide Web, Web information retrieval (IR) has been a hot research topic, but the research has been restricted by the lack of appropriate test collections. According to the framework of existing foreign test collections, we constructed large-scale Chinese Web Test collections (CWT), and organized SEWM Chinese Web search evaluation. Based on the investigation and analysis of current research, the details in constructing each component are introduced, and effective statistical analysis and experiments are carried through. The methodology used in engineering CWT should be readily applicable to the construction of future Web corpora.