东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

类属数据的贝叶斯聚类算法

ISSN号：1001-9081
期刊名称：《计算机应用》
时间：0
分类：TP274.2[自动化与计算机技术—控制科学与工程;自动化与计算机技术—检测技术与自动化装置]
作者机构：[1]中国西南电子技术研究所,成都610036, [2]福建师范大学数学与计算机科学学院,福州350117
相关基金：国家自然科学基金资助项目（61175123）; 福建省自然科学基金资助项目（2015J01238）

关键词：数据聚类, 类属型属性, 属性加权, 贝叶斯聚类, 概率模型, data clustering, categorical attribute, attribute weighting, Bayesian clustering, probability model

中文摘要：

针对类属型数据聚类中对象间距离函数定义的困难问题,提出一种基于贝叶斯概率估计的类属数据聚类算法。首先,提出一种属性加权的概率模型,在这个模型中每个类属属性被赋予一个反映其重要性的权重;其次,经过贝叶斯公式的变换,定义了基于最大似然估计的聚类优化目标函数,并提出了一种基于划分的聚类算法,该算法不再依赖于对象间的距离,而是根据对象与数据集划分间的加权似然进行聚类;第三,推导了计算属性权重的表达式,得出了类属型属性权重与其符号分布的信息熵成反比的结论。在实际数据和合成数据集上进行了实验,结果表明,与基于距离的现有聚类算法相比,所提算法提高了聚类精度,特别是在生物信息学数据上取得了5%~48%的提升幅度,并可以获得有实际意义的属性加权结果。

英文摘要：

To address the difficulty of defining a meaningful distance measure for categorical data clustering, a new categorical data clustering algorithm was proposed based on Bayesian probability estimation. Firstly, a probability model with automatic attribute-weighting was proposed, in which each categorical attribute is assigned an individual weight to indicate its importance for clustering. Secondly, a clustering objective function was derived using maximum likelihood estimation and Bayesian transformation, then a partitioning algorithm was proposed to optimize the objective function which groups data according to the weighted likelihood between objects and clusters instead of the pairwise distances. Thirdly, an expression for estimating the attribute weights was derived, indicating that the weight should be inversely proportional to the entropy of category distribution. The experiments were conducted on some real datasets and a synthetic dataset. The results show that the proposed algorithm yields higher clustering accuracy than the existing distance-based algorithms, achieving 5%-48% improvements on the Bioinformatics data with meaningful attribute-weighting results for the categorical attributes.

同期刊论文项目

面向软件行为鉴别的事件序列挖掘方法研究

期刊论文 53 会议论文 11 获奖 2 著作 1

　细粒度行为数据的预测性模型及其学习

期刊论文 3

同项目期刊论文

识别聚类间远近关系的双几何体模型

检测迷惑恶意代码的层次化特征选择方法

使用多分类器组合的只能反钓鱼架构

A Novel Hierarchical Clustering Algorithm for Gene Sequences

Combined New Nonnegative Matrix Factorization Algorithms with Two-dimensional Nonnegative Matrix Fac

Soft subspace clustering of categorical data with probabilistic distance

Projected-prototype based classifier for text categorization

基于网格最小生成树的聚类算法选择

属性加权的类属型数据非模聚类

一种新型协作多机器人路径规划算法

一种基于改进Theta* 的机器人路径规划算法

基于RSKNN 分类改进算法

商务处理模型的配置

基于词性标注序列特征提取的微博情感分类

一种匹配全局结构的图相似性度量

EM-type method for measuring graph dissimilarity

自适应熵的投影聚类算法

基于隐马尔科夫模型的DNA序列分类方法

优化子空间的高维聚类算法

A Novel Variable-order Markov Model for Clustering Categorical Sequences

A probabilistic framework for optimizing projected clusters with categorical attributes

规范化相似度的符号序列层次聚类

DNA序列的二阶隐马尔科夫模型分类

Nearest neighbor classification of categorical data by attributes weighting

SMwKnn:基于类别子空间距离加权的互K近邻算法

Malicious sequential pattern mining for automatic malware detection

软件代码的恶意行为学习与分类

基于符号熵的序列相似性度量

Kernel-based linear classification on categorical data

一种基于混合模型的数据流概念漂移检测算法

Image Processing using Newton-based Algorithm of Nonnegative Matrix Factorization

一种事件序列的加权变阶马尔可夫模型

Modeling and Analyzing Mixed Communications in Service-oriented Trustworthy Software

恶意软件鉴别技术及其应用

Analyzing Event-based Scheduling in Concurrent Reactive Systems

基于符号化聚合近似的时间序列相似性复合度量方法

基于新 Haar-like 特征的 Adaboost 人脸检测算法

一种基于Pareto排序的混合多目标进化算法

基于滑动窗口和蚁群优化算法的二次路径规划算法

多维数据的聚类结果可视化技术综述

融合速度特征的压缩感知目标跟踪算法

基于簇间分离性的稀有类识别算法

具有多形态种群协同进化的多目标优化算法

云计算中服务虚拟的形式依赖分析

融合张角拥挤控制策略的高维多目标优化

不平衡数据的软子空间聚类算法

核密度估计的聚类算法

期刊信息

《计算机应用》
北大核心期刊（2011版）

主管单位:四川省科学技术协会
主办单位:四川省计算机学会中国科学院成都分院
主编：张景中
地址：成都市人民南路四段九号科分院计算所
邮编：610041
邮箱：xzh@joca.cn
电话：028-85224283

国际标准刊号：ISSN：1001-9081
国内统一刊号：ISSN：51-1307/TP
邮发代号:62-110

获奖情况:
全国优秀科技期刊一等奖,国家期刊奖提名奖,中国期刊方阵双奖期刊,中文核心期刊,中国科技核心期刊

国内外数据库收录:
俄罗斯文摘杂志,波兰哥白尼索引,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:53679