东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

不确定度模型下数据流自适应网格密度聚类算法

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：2014
页码：2518-2527
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]哈尔滨工程大学自动化学院,哈尔滨150001, [2]哈尔滨工程大学计算机科学与技术学院,哈尔滨150001
相关基金：国家自然科学基金项目（61202274）;中国博士后科学基金项目（2012M510927）;黑龙江省博士后科学基金项目（LBH-Z12066）;中央高校基本科研业务费专项资金项目（HEUCF100602）
相关项目：不确定性数据流自适应聚类分析及演化分析方法研究

关键词：不确定性, 数据流, 聚类, 网格-密度, 自适应密度阈值, 不确定度模型, uncertain character, data stream, clustering, grid-density, adaptive density threshold, uncertainty model

中文摘要：

随着计算机技术及感知技术的发展及应用,各个领域普遍出现不确定性数据流形态的新型数据,吸引了众多研究者的关注.现有的数据流聚类技术普遍忽略不确定性特征,常导致聚类结果的不合理甚至不可用.为数不多的针对不确定性特征的聚类方法片面考察不确定性,且大多基于K-Means算法,具有先天缺陷.针对这一问题展开研究,提出了不确定度模型下数据流自适应网格密度聚类算法（adaptive density-based clustering algorithm over uncertain data stream,ADC-UStream）.对于不确定性特征,该算法在存在级和属性级不确定性统一策略下,构建熵不确定度模型进行不确定性度量,综合考察不确定性.采用网格密度的聚类算法,基于衰减窗口模型设计时态和空间的自适应密度阈值,以适应不确定性数据流的时态性和非均匀分布特征.实验结果表明,不确定模型下的数据流网格密度自适应聚类算法ADC-UStream在聚类结果质量和聚类效率方面都具有较好的性能.

英文摘要：

Uncertain data stream, a new widespread data form which is emerging in many application fields with the development of computer and sensing technology. The research of data analysis and processing of uncertain data stream has attracted the attention of many researchers. Existing data stream clustering techniques generally ignored uncertainty characteristics. It often makes the clustering results unreasonable even unavailable. The two aspects of uncertain character, existence- uncertainty and attributive-uncertainty, can affect the clustering process and results significantly. But they can＇t be considered at same time in existing relevant work. The lately reported clustering algorithms are all based on K-Means algorithm with inherent shortage. In order to solve this problem, a data stream adaptive grid-density based algorithm, ADC-UStream, is proposed under the uncertainty of mode[. For the uncertainty characteristic, with the unified strategy of the presence and properties uncertainty, the algorithm builds the entropy uncertainty model to measure the uncertainty. With the comprehensive survey of uncertainty, the grid-density based clustering algorithm over attenuation window model is adopted to design the temporal and spatial adaptive density threshold, to adapt to the temporal and non-uniform distribution characteristics of the uncertainty data flow. The experimental results show that the ADC-UStream algorithm under the uncertainty model has good performance both in clustering quality and clustering efficiency.

同期刊论文项目