基于用户标签实现社交网络缄默用户里的老年用户预测,有助于解决使用传统基于用户产生文本的预测方法难以预测缄默用户内老年用户的难题。有效预测出缄默用户中的老年人有助于为这部分用户提供诸如适老化用户界面、适老信息以及好友推荐等适老服务,减轻老年人使用社交网络的负担。使用Word2vec和LDA两种方法,本文提取了社交网络用户标签特征向量,并便用六种不同分类算法对社交网络内老年用户进行预测。根据TF-IDF计算不同年龄组别用户标签词的热度指数,本文发现不同年龄组别用户的热门标签词语存在明显差别,表明使用标签词预测用户的年龄分组具有一定可行性。使用Word2vec方法提取用户标签特征,同时使用简单逻辑回归或随机森林分类模型可以有效判断缄默用户是否为老年用户,在不使用任何社交网络拓扑结构和用户生成文本的情况下,分类正确率达到66%。
The problem of distinguishing senior users from mute users was solved by the tags in their pro- files in social media. Finding the mute seniors is helpful for providing suitable user interface to these users and recommending suitable information for these users, and can reduce these senior users' burden of the so- cial network. We use Word2vec and LDA to extract users' features to predict whether the user is a senior citizen or not. This paper uses TF-IDF to compute the tag's popularity in different age groups, finding that there is distinct difference among different age groups. So tags can be used to predict users' age group. Ex- periment results demonstrated that the approaches(using Word2vec to extract features and using random for- est or logistic regression to predict the age group) can make accurate prediction on whether a user is a sen- ior user. Its accuracy can achieve 66% without any user generated content or network topology.