针对当数据集合有敏感信息时,直接发布频繁序列模式本身及其支持度计数都有可能泄露用户隐私信息的问题,提出一种满足差分隐私(DP)的频繁序列模式挖掘(DP—FSM)算法。该算法利用向下封闭性质生成候选序列模式集,基于智能截断方法从候选模式中挑选出频繁的序列模式,最后采用几何机制对所选出模式的真实支持度添加噪声进行扰动。另外,为了提高挖掘结果的可用性,设计了一个阈值修正的策略来减小挖掘过程中的截断误差和传播误差。理论分析证明了该算法满足ε-差分隐私。实验结果表明了该算法在拒真率(FNR)和相对支持度误差(RSE)两个指标上明显低于对比算法PFS3,有效地提高了挖掘结果的准确度。
Focusing on the issue that releasing frequent sequence patterns and the corresponding true supports may reveal the individuals' privacy when the data set contains sensitive information, a Differential Private Frequent Sequence Mining ( DP- FSM) algorithm was proposed. Downward closure property was used to generate a candidate set of sequence patterns, smart truncating based technique was used to sample frequent patterns in the candidate set, and geometric mechanism was utilized to perturb the true supports of each sampled pattern. In addition, to improve the usability of the results, a threshold modification method was proposed to reduce truncation error and propagation error in mining process. The theoretical analysis show that the proposed method is 8-differentially private. The experimental results demonstrate that the proposed method has lower False Negative Rate (FNR) and Relative Support Error (RSE) than that of the comparison algorithm named PFS2, thus effectively improving the accuracy of mining results.