根据新浪微博的实际用户数据,发现粉丝数、关注数和微博数3个特征量的分布,都存在双段幂律分布现象,不同类型用户特征量的分布具有差异性。使用双帕累托对数正态(DPLN)分布对数据进行拟合,相比对数正态分布和幂律分布,可以得到更优的效果。用户活跃时间服从指数分布,不同活跃时间的3个用户特征量都近似服从对数正态分布;用户特征量的增长率服从对数正态分布,且与特征量自身的规模无关,这些特征与双帕累托对数正态分布模型一致,从而使用这一模型可以很好地解释粉丝数、关注数和微博数分布特性的形成机制。
Based on the actual data from Sina Weibo,this paper mainly discusses the distributionof three users’characteristics—the number of followers,friends,and statuses.They are subjectto the double power-law distribution and different types of users with various features.It is foundthat the double Pareto lognormal (DPLN)distribution can better fit the overall distribution ofuser’s three characteristics than the lognormal distribution and power-law distribution.Moreover,the user activity span is found to be exponentially distributed,and the number of these three usercharacteristics approximately follows the lognormal distribution in the different active spans.Furthermore,it is observed that these users’characteristics growth rates follow lognormal distri-bution and are independent with users’characteristics.This phenomenon is consistent with thedouble Pareto lognormal distribution model.These new findings could help explain the formationmechanism of the number of followers,friends and statuses in microblog.