随着位置服务(Location Based Service,LBS)的广泛应用,隐私保护成为LBS进一步深入发展亟待解决的问题。时空K-匿名成为一个主流方向。LBS应用服务器存储用户执行连续查询生成的历史匿名数据集,分析大时空尺度历史的匿名数据集,空间预测可以实现LBS应用的个性化服务。本文提出了一种融合概率统计与数据挖掘2种典型技术——马尔科夫链与序列规则,对匿名数据集中包含的特定空间区域进行预测的方法。方法包括4个过程:(1)分析序列规则、马尔科夫过程进行预测的特点;(2)以匿名数据集序列规则的均一化置信度为初始转移概率,构建n步转移概率矩阵;(3)设计以n步转移概率矩阵进行概略空间预测的方法,以及改进的指定精确路径的空间预测方法;(4)实验验证方法的性能。结果证明,该方法具有模型结构建立速度快、精确空间预测概率与真实概率的近似度可灵活调节等优点,具有可用性。
Recently, spatial-temporal K-anonymity has become a prominent method among a series of techniques for user privacy protection in Location Based Services (LBS) applications, because of its easy implementation and broad applicability. Analyzing spatial prediction scenarios based on spatial-temporal K- anonymity datasets is important in improving the utilization of LBS anonymity datasets for individualized services. In this paper, we present a spatial prediction method by combining the advantages of probabilistic statistics techniques and data mining techniques. The detailed process is divided into four phases: Phase 1, the predictive characteristics based on sequential rules and Markova chain are studied, and then an algorithm is designed to compute the n-step transition probability matrices of normalized sequential rules mined from sequences of spatial-temporal K-anonymity datasets; Phase 2, directly adopting the n-step transition probability matrices of example datasets, the simple predictions are performed; however, the drawback of this method is also found: the full path of the simple predictions cannot be learned, which is very important to the analysis of behavior patterns of LBS users; therefore in Phase 3, a precise predictive algorithm is designed, which recursively discovers the detailed k step path, its transition probability from the detailed k- 1 step, and the simple k step that includes the start and the stop node only; and in Phase 4, simulation expen'ments are conducted, while the experimental results demonstrate that the proposed approach can build the predictive model faster than traditional methods, and can also adjust the accuracy of the predictions flexibly by setting different confidence thresholds for sequential rules of datasets.