社交媒体话题检测一直是个热点问题,由于社交数据杂乱异构,且具有时效性,语义模糊性等特点,话题检测也是个难点问题.研究利用复杂网络对社交文本数据进行建模,并结合一种基于极大团凝聚层次聚类的重叠社团发现方法实现了社交话题的检测.文本数据建模中,通过自定义突发系数量化话题词,即把话题词看作具有时域分布偏好的关键词,并通过自定义相关系数连接话题词,构建话题网络.为使自定义系数更适用于动态数据环境,实验结合真实数据进行了适应性测试优化系数.文章把采用EAGLE重叠社团发现方法在公开数据集上评测,根据Q函数值显示结果明显优于当前一些重叠社团发现策略,研究对采样的60万条青少年社交数据进行了话题分析并可视化了分析结果.
Topic detection in social media is a hot yet challenging issue in social computing given most da- ta there are heterogeneous, time-evolving and linguistically ambiguous. In this paper, the authors ex- plore the idea of achieving this goal through complex network modeling which has demonstrated excel- lent interpretability of the real world. Specifically, a complex network was constructed based on pre- processed topic words where two parameters, namely the emergency and correlation coefficients, were also introduced to allow us to filter social data through the network as well as determine their possible correlations. This approach was then applied to analyze 600,000 messages by teenager users in Weibo. corn to identify overlapping communities with the help of the well-established algorithm EAGLE. It was demonstrated that, compared to other popular approaches such as CONGO and Peacock a much better Q-value results has been obtained by the method proposed here.