现有的Web用户访问路径信息发现方法大都着眼于从静态的Web日志快照中进行挖掘。本文力图从Web访问数据的历史演变过程中,发现新的知识——持久偏爱的Web用户访问路径PP-WAP。PP-WAP实际上是历史访问序列WAS中大部分时间支持度值波动很小且保持较高的访问路径信息。本文首先介绍了相关背景和PP-WAP的应用领域。接下来,利用无序树结构来表示历史WAS集合,同时给出了PP—WAP的定义和挖掘算法描述。最后,分别针对模拟和实际数据集对算法的可扩展性以及PP-WAP的应用价值作了实验分析。
Existing Web access path mining techniques focus only on discovering knowledge from the static snapshot of Web log data. This paper tries to discover new knowledge--PP-WAP (persistent and preferred Web users' access paths) from dynamic nature of historical Web access data. These PP-WAPs are actually access paths whose support values keep higher and have fewer fluctuations in the historical Web access sequences (WAS). Firstly, the background and applications of PP-WAPs are introduced in this paper. Secondly, this paper adopts unordered tree structure to represent historical WAS sets and then presents the detailed definition and mining algorithm description of PP-WAP. Finally, according to simulation dataset and real dataset, this paper respectively conducts experiments to analyze the scalability of the algorithm and application value of PP-WAP.