挖掘隐藏在异质信息网络中丰富的语义信息是数据挖掘的重要任务之一.离群点在值、数据分布、和产生机制上都明显不同于正常数据对象.检测离群点并分析其不同的产生机制,最终消除离群点具有重要的现实意义.目前,针对异质信息网络动态离群点检测的研究工作相对较少,还有很多问题有待解决.由于异质信息网络的动态性,随着时间的变化,正常数据对象也可能转变为离群点.针对异质网络提出一种基于张量表示的动态离群点检测方法(TRBOutlier),并根据张量表示的高阶数据构建张量索引树.通过搜索张量索引树,将特征加入到直接项集和间接项集中.同时,根据基于短文本相关性的聚类方法来判断数据集中的数据对象是否偏离其原聚簇来动态检测网络中的离群点.该模型能够在充分降低时间和空间复杂度的条件下保留异质网络中的语义信息.实验结果表明:该方法能够快速有效地进行异质网络环境下的动态离群点检测.
Mining rich semantic information hidden in heterogeneous information network is an important task in data mining.The value,data distribution and generation mechanism of outliers are all different from that of normal data.It is of great significance of analyzing its generation mechanism or even eliminating outliers.Outlier detection in homogeneous information network has been studied and explored for a long time.However,few of them are aiming at dynamic outlier detection in heterogeneous networks.Many issues need to be settled.Due to the dynamics of the heterogeneous information network,normal data may become outliers over time.This paper proposes a dynamic tensor representation based outlier detection method,called TRBOutlier.It constructs tensor index tree according to the high order data represented by tensor.The features are added to direct item set and indirect item set respectively when searching the tensor index tree.Meanwhile,we describe a clustering method based on the correlation of short texts to judge whether the objects in datasets change their original clusters and then detect outliers dynamically.This model can keep the semantic relationship in heterogeneous networks as much as possible in the case of fully reducing the time and space complexity.The experimental results show that our proposed method can detect outliers dynamically in heterogeneous information network effectively and efficiently.