Internet资源的指数级增长促进了个性化服务的发展.针对传统的用户兴趣建模方法在准确率和增量处理能力方面的不足,提出了一种新的基于概念聚类的用户兴趣建模方法UIM^2C^2(User Interest Modeling Method based on Conceptual Clustering).该方法首先通过分析用户访问的历史文档构造后缀树结构,然后选择不同的相似度阈值,以不同的粒度合并基本簇.依据不同阈值条件下合并的基本簇之间的包含关系,生成用户的兴趣层次.UIM^2C^2方法是针对文档的一个增量式、无监督的概念学习方法,因此用户描述文件可以轻易的获取和更新.最后,通过数据集20NewsGroup上的实验验证了UIM^2C^2方法在兴趣预测方面的有效性.
The exponential increase of internet resources accelerated the development of effective personalization techniques. A new method for modeling user interest, named UIM^2C^2(user interest modeling method based on conceptual clustering) was presented. The method analyzed documents that each user ever browsed and created a suffix tree. According to different pair-wise base cluster similarity thresholds, base clusters could be merged in the range of different granularity. Combining with the inclusion relation between merged base clusters under different granularity, an interest hierarchy was generated. UIM^2C^2 carried out incremental, unsupervised concept learning over Web documents so that user profiles could be acquired and updated easily. Experimental results prove the effectiveness of the method in Web page recommendation.