随着微博注册用户的增长,探测不活跃账号,自动判定用户活跃度有重要的商业价值。该文提出了一种自动检测算法并通过实验验证。算法核心是提出的影响用户活跃度的4个判定因子,可由用户行为计算得到。算法包含用户活跃度概率层次模型(ADPHM)和用户评分模型(USM)。ADPHM模型计算用户是不活跃用户的概率;USM模型计算用户活跃度得分。实验数据集包含了新浪微博2316281个用户信息和141322019条微博内容。实验结果表明,该算法能在线性时间复杂度下自动检测出不活跃账号,完善用户可信度评估体系。
With the growth of registered users in microblog, how to detect inactive accounts and automatically judge the user activity have an important commercial value. To meet this need, an automatic detection algorithm is proposed and experimentally tested. The kernel of automatic detection algorithm is four determining factors of inactive users we defined, which can be calculated by user’s behavior. The algorithm contains User Active Degree Probability Hierarchical Model (ADPHM) and User Scoring Model (USM). The ADPHM is employed to estimate the probability of inactive user;the USM is used to give a user's activity score. Experiment data contains 2 316 281 users’ information and their 141 322 019 tweets crawled from Sina-Weibo. Experimental results show that this method can detect inactive users automatically and improve user confidence evaluation system in linear time complexity.