会话识别是用户访问行为分析的基础和关键工作,其质量对于识别和发现用户的信息需求具有决定性的影响。目前常用的是基于时间阈值的切分方法,但是该方法存在的主要问题是针对不同用户时间阈值难以准确地确定。提出了一种新的基于聚类技术的会话识别优化方法,首先建立了基于聚类的会话识别优化模型,然后采用改进的K-means算法进行会话识别。实验结果表明该方法与传统方法相比具有较好的效果。
As the basic and critical work for user access behavior analysis, Web user session reconstruction has a decisive impact on identifying and discovering the information needs of users. Currently Web user session reconstruction is usually based on the time threshold segmentation method. But it is difficult for this method to accurately determine the time threshold for different user. This paper presented a new method for Web user session reconstruction based on clustering method. Firstly,it created an optimization model of Web user session reconstruction based on clustering, and then used an improved K-means algo- rithm for this model. Experimental results show that this method has better results compared with traditional methods.