东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于用户行为的情感影响力和易感性学习

ISSN号：0254-4164
期刊名称：《计算机学报》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院计算技术研究所中国科学院网络数据科学与技术重点实验室,北京100190, [2]中国科学院大学,北京100190, [3]中国信息安全评测中心,北京100085
相关基金：国家973计划（2012CB316303,2013CB329602）; 国家863计划（2014AA015204）; 国家自然科学基金（61232010,61425016,61572473,61572467）

作者：廖祥文[1,2], 郑候东[1,2], 刘盛华[3], 沈华伟[3], 程学旗[3], 陈国龙[1,2]

关键词： DOM树分层向量, 网页簇中心, 局部敏感哈希, 快速增量聚类, DOM tree layered vectors, web page cluster center, local sensitive hashing, fast incremental clustering

中文摘要：

面向结构相似的网页聚类是网络数据挖掘的一项重要技术。传统的网页聚类没有给出网页簇中心的表示方式,在计算点簇间和簇簇间相似度时需要计算多个点对的相似度,这种聚类算法一般比使用簇中心的聚类算法慢,难以满足大规模快速增量聚类的需求。针对此问题,该文提出一种快速增量网页聚类方法FPC（Fast Page Clustering）。在该方法中,先提出一种新的计算网页相似度的方法,其计算速度是简单树匹配算法的500倍;给出一种网页簇中心的表示方式,在此基础上使用Kmeans算法的一个变种MKmeans（Merge-Kmeans）进行聚类,在聚类算法层面上提高效率;使用局部敏感哈希技术,从数量庞大的网页类集中快速找出最相似的类,在增量合并层面上提高效率。

英文摘要：

Structure-oriented web page clustering is one of the most important technique in web data mining.Previous traditional methods haven＇t given a formal definition of the web page cluster center and have to calculate several point-wise similarities for the purpose of getting the similarity between a point and a cluster or the similarity between two clusters.The efficiency of these methods is much slower than the clustering algorithms using cluster center,especially they can＇t satisfy the need of large scale clustering in fast incremental web pages clustering.To solve these issues,this paper proposes a fast incremental clustering method FPC（Fast Page Clustering）.In our method,a new approach is given to calculat the similarity between two web pages which is 500 times faster than the Simple Tree Matching algorithm;then a formal representation of web page cluster center is described and a Kmeans-like MKmeans（Merge-Kmeans）clustering algorithm for fast clustering is applied;Moreover,we use local sensitive hashing technique to quickly find the most similar cluster in a large scale cluster set and improve the efficiency in terms of the incremental clustering.

同期刊论文项目

融合用户社会影响力和用户个性化特征的社会媒介倾向性检索研究

期刊论文 10

在线社会媒体中用户观点传播的建模与预测研究

期刊论文 2

同项目期刊论文

基于词对齐模型的中文评价对象与评价词抽取

基于评论关系图的垃圾评论者检测研究

Twitter中的情绪传染现象

张量分解在用户影响力度量中的应用

基于受限非负张量分解的用户社会影响力分析

基于卷积神经网络的中文微博观点分类

路径张量分解的知识图谱推理算法

融合用户观点的社会影响力分析

基于评论者关系的垃圾评论者识别研究

基于因果模型的主题热度计算与预测方法

期刊信息

《计算机学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国计算机学会中国科学院计算技术研究所
主编：孙凝晖
地址：北京中关村科学院南路6号
邮编：100190
邮箱：cjc@ict.ac.cn
电话：010-62620695

国际标准刊号：ISSN：0254-4164
国内统一刊号：ISSN：11-1826/TP
邮发代号:2-833

获奖情况:
中国期刊方阵“双效”期刊

国内外数据库收录:
美国数学评论（网络版）,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:48433