东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

面向高维数据的低冗余top-k异常点发现方法

期刊名称：计算机研究与发展, 05期, pp 788-795, 2010/5/15
时间：0
分类：TP311.13[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]北京大学信息科学技术学院,北京100871, [2]机器感知与智能教育部重点实验室（北京大学）,北京100871, [3]高可信软件技术教育部重点实验室（北京大学）,北京100871
相关基金：国家“八六三”高技术研究发展计划基金项目（2007AA120502）; 国家自然科学基金项目（60874082）
相关项目：动态数据驱动的道路交通拥挤演变预测研究

关键词：数据挖掘, 异常检测, 高维数据, 低冗余, 异常度量, data mining, anomaly detection, high dimensional data, redundancy-aware, exception measure

中文摘要：

异常发现是数据挖掘领域的一类重要任务.针对高维对象的异常度量问题和异常点集合的冗余问题,提出了一种新的面向高维数据的异常点发现方法.该方法通过采用高维数据的二部图表示,以高维对象的压缩能力作为其异常程度的度量,能够有效支持包含不同类型属性的高维数据.为了解决top-k异常点集合中的冗余问题,提出了低冗余top-k异常点的概念.由于精确计算低冗余的top-k异常点是NP-hard问题,设计了计算近似低冗余的top-k异常点的启发式方法k-AnomaliesHD算法.从在真实和人工数据集上的实验结果可以看出,该方法具有较好的扩展性;而且与不考虑冗余的异常点发现方法相比较,能够更有效地概括数据中的异常模式.

英文摘要：

Discovering anomalies is an important data mining task which has been studied in many applications In this paper,by emphasizing the problems of exception measurement of high dimensional objects and redundancy in the set of anomalies,an approach is proposed to discover the anomalies in high dimensional data With a bipartite graph representation of the given high dimensional dataset,the capability of compression of each object is used to measure the degree of exception of the object Based on the exception measure,the dataset containing different types of attributes,such as binary attributes,categorical attributes and numeric attributes,are well supported To solve the problem of redundancy in the set of top-k anomalies,the concept of redundancy-aware top-k anomalies is proposed For the problem of mining the exact set of the redundancy-aware top-k anomalies is NP-hard,an algorithm based on greedy heuristics,named k-AnomaliesHD,is designed to discover an approximate set of the redundancy-aware top-k anomalies efficiently The experimental study both on real and synthetic datasets shows that the algorithm scales linearly with the dimensionality of the dataset and quadratic to the size of the dataset Further,compared with the redundancy-unaware method,the set of redundancy-aware top-k anomalies is much more effective to cover the abnormal patterns of data

同期刊论文项目

动态数据驱动的道路交通拥挤演变预测研究

期刊论文 13 会议论文 24

同项目期刊论文

一种基于局部加权学习的自适应交通流预测机制

Accelerating Sequence Searching: Dimension Reduction Method

Efficient approaches for summarizing subspace clusters into k representatives

一种基于元启发式策略的迭代自学习K-Means算法

一种有效的基于生活熵的移动用户分类算法

面向结构稳定性的分裂-合并聚类算法

面向实时短时交通流预测的过程神经元网络建模