东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

差分隐私下的一种频繁序列模式挖掘方法

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：2015.12.1
页码：2789-2801
分类：TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]中国科学院软件研究所,北京100190, [2]中国科学院大学,北京100190, [3]河南财经政法大学计算机与信息工程学院,郑州450002, [4]湖南大学信息科学与工程学院,长沙410082
相关基金：国家科技重大专项基金项目（2012ZX01039-004）; 中国科学院战略性先导科技专项基金项目（XDA06010600）; 中国博士后科学基金一等资助项目（2014M560123）; 国家自然科学基金项目（61202285）
相关项目：基于邻近局部切空间相似性的多流形学习研究

关键词：频繁序列模式, 数据挖掘, 差分隐私, 隐私保护, 前缀树, frequent sequential pattern, data mining, differential privacy（DP）, privacy protection, prefix tree

中文摘要：

频繁序列模式挖掘是数据挖掘领域的1个基本问题,然而模式本身及其支持度计数都有可能泄露用户隐私信息.差分隐私（differential privacy,DP）作为一种新出现的隐私保护技术,定义了一个相当严格的攻击模型,通过添加噪音使数据失真达到隐私保护的目的.由于序列数据内在序列性和高维度的特点,给差分隐私应用于频繁序列模式挖掘带来了挑战.对此提出了一种基于交互式差分隐私保护框架的频繁序列模式挖掘算法Diff-FSPM（differential-privacy frequent sequential pattern mining）.该算法利用指数机制获取最优序列长度,并采用一种维规约策略获得原始序列数据集的规约表示,有效降低序列维度的影响;应用前缀树压缩频繁序列模式,利用拉普拉斯机制产生的噪音扰动频繁模式的真实支持度计数,同时采用闭频繁序列模式和Markov假设,有效分配隐私预算,并利用一致性约束后置处理,增强输出模式的可用性.理论角度证明算法满足ε-差分隐私,实验结果验证算法具有较好的可用性.

英文摘要：

Frequent sequential pattern mining is an exploratory problem in the field of data mining.However,directly releasing the discovered frequent patterns and the corresponding true supports may reveal the individuals＇ privacy.The state-of-the-art solution for this problem is differential privacy,which offers a strong degree of privacy protection by adding noise.Due to the inherent sequentiality and high-dimensionality in sequences,it is challenging to apply differential privacy to frequent sequential pattern mining.To address those problems,this paper presents a differentially private method called Diff-FSPM that uses an interactive way to find frequent patterns.To reduce the impact of dimensionality,Diff-FSPM first employs the exponential mechanism to obtain the optimal sequential length for truncating each sequence in the original dataset.After that,Diff-FSPM relies on aprefix-tree to compress all the frequent patterns,and then utilizes the Laplace mechanism to perturb the true support of the frequent patterns.To efficiently allocate the privacy budget,Diff-FSPM uses the closet frequent pattern and Markov assumption to guide the mining process.Finally,Diff-FSPM adopts a post-processing technique with consistency constraint to boost the accuracy of the returned noisy support counts.We theoretically prove that Diff-FSPM satisfiesε-differential privacy,and the experimental results show that it outperforms the existing methods in terms of utility.

同期刊论文项目