Web站点用户浏览模式自动分类可以更好地组织站点上的内容信息来满足不同用户的访问需求。Web使用挖掘技术已经在这项研究中得到了广泛的应用,但是集成Web内容挖掘的成果还不多见。本文首先给出了结合Web内容和使用挖掘技术的用户浏览模式分类的原型系统框架。系统中主要的过程是:对数据集中原始的Web服务器日志进行清理,使用Web使用挖掘技术从用户浏览会话中挖掘出有代表性的用户浏览模式,根据模式中每一个相关的页面内容抽取出一个N-gram集合,构建基于N-gram的用户浏览模式简档。最后本文对用户浏览会话作了分类实验分析,实验结果表明这个方法在N-gram=6,df=10%的情况下取得了较高的分类精确度。
Automatic classification of user navigation patterns provides a useful tool to better organize the contents of the websites to cater to the needs of different users. Web usage mining techniques have been widely applied for such research. However, few efforts were made to integrate Web content mining with Web usage mining. Firstly, this paper presents the architecture of the prototype system proposed for classifying user navigation patterns. The main processes in the prototype system are: primary Web-log preprocessing to extract user navigation sessions from Dataset; mining the representatives of user navigation patterns; representing the contents of every Web page of user navigation patterns by N-grams; building N-gram-based user navigation pattern profiles. Finally, experiments are conducted on Web users' session classification and the results shows that the method achieves higher classification accuracy under condition of N-gram = 6 and df= 10% .