针对微博用户分类问题提出时间片微元的概念,建立了时间片微元模型;对每个时间片内的微博所涉及到的用户进行研究得到时间片微元内部的用户兴趣度向量,最终整合所有时间片内的用户兴趣度向量,再对整个时间段内用户的兴趣度向量进行两次朴素贝叶斯分类,得到整个时间段内的用户分类.同时,对微博内容研究上规避了传统的单一的依靠系统标签形成用户网络的方式,结合了自然语言处理技术,提取用户兴趣方向,形成用户兴趣向量,然后对用户兴趣向量分析,采用改进的朴素贝叶斯分类算法进行用户分类.最后是对所提出的方法按详细步骤进行实验,研究结果表明,基于时间片的微博用户分类方法能有效对大规模的微博语料中所涉及到的用户进行较准确分类.为研究微博用户分娄问题具有一定推动作用.
Force on the classification of micro blogging users, we propose the concept of the time element establish micro-element model of a time slice. After researching on the users that micro blogging within each time slice involved in, we can get the users' classification of total time from the users' classification result of single period of time. At the same time, to avoid the traditional form that is single and relies on the system label to form a user network for the content of micro blogging, we combine the natural language processing technology to extract the direction of user interest and the formation of user interest vector. Then throughout the analysis of user interest vector, we use an advanced Bayesian classification algorithm for user classification. According to experimental results, we can see that the micro blogging users based on time slice classification method can effectively avoid the problem of the large com- putational complexity for large data, and it has a certain role in promoting for the micro blogging users classification issues.