随着互联网用户人数的日益增长,用户行为分析已经成为互联网技术领域重要的研究方法之一。在日志中去除异常点击,对于准确挖掘用户行为的意图和习惯十分重要。该文采用某公司提供的真实用户互联网访问日志,对日志中的连续点击,单IP多用户以及单用户多IP等可能的异常点击,从访问集中度,用户平均访问量等方面进行了分析。我们认为对于连续点击,用户行为分析研究人员可以分情况滤去多余点击或该用户所有点击,而对于单IP多用户和单用户多IP的点击,我们建议不做处理。
Nowadays, user behavior analysis has been widely used in Web research fields. Therefore, how to remove abnormal clicks from Web user access logs is very important for extracting true information on user purpose and behavior. In this paper, with real world Web User Access Logs provided by a commercial search engine company ,we analyze some possible abnormal clicks--such as continuous click, one user many IPs, one ip many users, from some perspectives--for the degree of concentration for user to access web sites, average daily clicks of one user, etc. We suggest that for continuous click, user behavior researcher can eliminate superfluous and repetitive clicks or all the clicks of the user with continuous click, and the cases of one ip many users and one user many ips can be left untouched.