基于大规模搜索日志进行用户行为分析有助提高搜索引擎的各种性能指标。从三个方面对百度开放日志进行详细分析。首先对查询串长度和频次进行统计,发现查询串中存在着长尾效应,前10%最常用查询串的查询次数占总查询次数的70.8%。其次对URL点击深度和频次进行分析,发现有73%的网页只被点击一次,表明互联网中存在着大量低频访问网页。最后对用户使用高级检索情况进行分析,发现有不足0.12%的用户使用高级检索,表明用户更喜爱简单方便的操作。
Analysing user behaviour based on large-scale search logs contributes to improving various performance indexes of search engines. In this paper we make detailed analyses on Baidu search logs from three aspects. First, the statistics of query string length and frequency show that the long tail effect exists in query strings, top 10% of queries with high frequency account for 70.8% of total queries. Second; the analyses on depth and frequency of URL clicks show that 73% of the web pages are clicked once only, this illustrates that there are a lot of low frequency web pages. Finally, the analyses on users advanced search show that less than O. 12% of the users use this function, and the simple operation is more preferable by the users.