空间位置信息通常代表了设备使用人群的地理空间活动特征,客观体现人群活动的时空分布。针对现有的微博数据抓取方法由于普通用户的访问限制易导致采集的目标数据缺失的问题,该文提出了一种目标区域空间划分策略。在数据抓取之前对目标区域实行格网化,进而实现数据的同时抓取。通过统计分析基于网格单元抓取的位置微博数据,从中提取出人群活动信息,结合位置微博数据所在的兴趣点类型,统计分析了位置微博用户的时空分布和活动特征。这种方法缩小了采集区域,可实现并行高效的位置微博抓取,并保证了采集范围的重叠,最大限度地保证采集数据的完整性。
Spatial location information usually symbolize the geographical spatial activity features of people who use the smart mobile devices, and the features can reflect the users activity temporal and special distribution. Aiming at the problem of goal data missing caused by the restriction of common users' access of existed Microblog data fetching methods, this paper proposed a spatial partition strategy for target area. Before fetching the Microblog data, the grid transformation was carried out in the target area to realize data fetching at the same time. Through counting and analyzing the location data fetched by the grid cells, the user's activity information could be extracted from distribution trend based on different POI types. This method greatly narrowed the collection area and realized the efficient parallel of Microblog position data fetching, which ensured the overlap of collection scope and the integrity of collecting data.