为解决社交媒体中缄默用户的性别预测问题,提出利用用户文件夹中的兴趣标签进行区分的方法.针对标签存在稀疏和歧义性的特点,设计了一种基于概念类推断用户性别的框架.首先依据社交心理特征将标签划分为若干概念类;其次通过关联挖掘方法扩充概念类;最后通过概念类压缩用户特征空间.在新浪微博真实数据集上进行验证,实验结果表明:所提方法对于缄默用户性别有显著的区分效果,在不使用任何微博信息的条件下,区分准确率达到71%.
The problem of gender prediction was studied for mute users in social media. A novel ap- proach using the interest tags in users' profile was proposed. In order to solve the problems caused by the sparse and ambiguous property of the tags, a framework was designed, which used the conceptual class to infer the gender of mute users. Firstly, the interest tags were divided into a set of conceptual classes according to the social-psychological characteristics. Second, the conceptual classes were ex- panded based on association mining. Finally, the conceptual classes were used to condense the usersr feature space. Extensive experiments were conducted on a real data set extracted from Sina Weibo. Experimental results demonstrate that the proposed approach can make accurate predictions on mute ' users genders. Its accuracy achieves 71% without using any micro-blog information.