在线社交网络中日益丰富的地理位置信息为传统舆情感知、信息检索技术带来了新的思考.文中以在线社交平台Twitter为研究对象,以社交网络中地域性话题(Geographical Topic)发现为研究目标,工作主要分为社交网络话题性和地域性分析、地域性话题发现两个部分.首先,文中基于用户、位置和话题间的相互关系,阐述了社交网络用户具有地域性和话题性特征,分析了地理位置和话题对词项使用的影响,抽象出地域和话题之间的关联.其次,根据地域性话题的空间关联特征,综合考虑用户发布的文本内容和地理位置信息,按照主题模型思想构建地域性话题发现模型GTTD(Geographical Textual Topic Discovering model),将用户、话题和地理位置间存在的紧密关系同时引入话题发现框架中.最后利用吉布斯采样算法进行模型的参数估计.基于Twitter真实数据集的实验表明:文中提出的GTTD模型能有效地发现社交网络中的地域性话题,并且与LGTA、Geofolk模型对比,在困惑度(perplexity)指标上体现出优势.
The increasingly rich geographical location information in online social networks has brought new thought to traditional public opinion perception and information retrieval technology. Based on the online social networking platform Twitter, the research in this paper is carried out for the purpose of discovering geographical topics and this work consists of two parts: (1) analysis of the topical and spatial properties of online social networks; (2) discovery of geographical topics. Firstly, based on the relation among users, locations and topics, users' regional and topical features are presented; then the impact of locations and topics on the use of terms is analyzed after which the correlation between regions and topics is abstracted. Secondly, according to the spatial characteristics of geographical topics, we take into consideration both the user generated contents and the geographical location information, and construct the Geographical Textual TopicDiscovering model (GTTD) on the basis of theme modeling. The proposed GTTD model is able to introduce the close relationship among user, topic and region into one unified topic discovering framework at the same time. In the end, the Gibbs Sampling algorithm is applied for hidden variable parameter inference for the GTTD model geographical topics effectively, meanwhile it shows better performance in the criteria of perplexity than LGTA model and Geofolk model.