首先对国内微博平台的信息进行了综合分析,主要介绍了微博信息的定义,在错综复杂的微博信息中哪些信息比较重要,以及这些微博信息包含哪些详细的内容,是如何组织的。然后选取新浪微博平台作为研究对象,利用新浪微博API设计了爬虫程序,抽取用户信息;以用户的关注人数、粉丝数和发布的微博数为标准对用户信息进行了定量分析。最后根据分析结果,针对不同特征的用户群体提出了相应的标签推荐方法。
The paper first makes a comprehensive analysis for the information of domestic microblog platforms,then introduces the definition of microblog information,points out which part of the complicated microblog information is more important,what is included in the information and how the information is organized.After that,it chooses Sina Microblog for study and designs a crawler by using its API to collect a large number of user information.Furthermore,the paper makes a quantitative analysis of user information by using the number of user's friends,fans and microblogs.At last,the paper puts forward different tag recommendation strategies for different user groups by using the result of analysis.