根据抓取到的新浪微博实际用户数据,分析了粉丝数、关注数和微博数3个特征量的增长模式,发现这3个特征量整体上都随时间线性增长,取整后的增长率服从幂律分布.用户特征量增长模式主要呈持续增长和爆发式增长,其中爆发式增长用户按增长的不同阶段又可以划分为前期、中期、后期和阶跃式4种增长模式.使用基于向量余弦距离相似性的K—means聚类算法,对不同排序和不同初始规模实际用户特征量的时间序列进行聚类分析,统计得到不同增长模式的用户数量.发现用户特征量中增速高的用户增长主要以爆发式增长为主,而规模高的用户增长以持续式增长为主.通过对用户粉丝数爆发式增长的过程分析,对比用户微博被转发和被评论二者的增长关系,提出了导致用户粉丝数爆发式增长的原因.
Based on the actual data crawled from Sina Microblog, this paper mainly analyzes the growth law of three user characteristics: the number of followers, friends and statuses. They all increase linearly with time and the growth rate in round figures obeys the power-law distribution. It is found that these characteristics are mainly in sustainable and explosive growth patterns. Moreover, the user with the explosive growth pattern can be divided into four main categories, such as early- stage growth pattern, middle-stage growth pattern, later-stage growth pattern, and step-stage growth pattern. Furthermore, the users' number of different growth patterns can be counted using the K-means clustering algorithm, which is based on the vector cosine similarity. The growth patterns of user characteristics are observed by cluster analysis of the actual time series, which are grouped by different sorting methods and initial scales. It is observed that the users with higher growth rate are mainly in explosive growth pattern, and the users with higher initial number tend to be in sustainable growth pattern. Finally, based on the analysis of the explosive growth process of the number of followers, the relationships between the growth of the numbers of retweet and comment are compared, and the reasons for the explosive growth of the users are proposed.