东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

属性加权的类属型数据非模聚类

ISSN号：1000-9825
期刊名称：软件学报
时间：2013.11
页码：2628-2641
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]福建师范大学数学与计算机科学学院,福建福州350108
相关基金：国家自然科学基金（61175123）
相关项目：面向软件行为鉴别的事件序列挖掘方法研究

关键词：聚类, 类属型数据, 模, 属性加权, clustering, categorical data, mode, attribute weighting

中文摘要：

类属型数据广泛分布于生物信息学等许多应用领域，其离散取值的特点使得类属数据聚类成为统计机器学习领域一项困难的任务．当前的主流方法依赖于类属属性的模进行聚类优化和相关属性的权重计算．提出一种非模的类属型数据统计聚类方法．首先，基于新定义的相异度度量，推导了属性加权的类属数据聚类目标函数．该函数以对象与簇之间的平均距离为基础，从而避免了现有方法以模为中心导致的问题．其次，定义了一种类属型数据的软子空间聚类算法．该算法在聚类过程中根据属性取值的总体分布，而不仅限于属性的模，赋予每个属性衡量其与簇类相关程度的权重，实现自动的特征选择．在合成数据和实际应用数据集上的实验结果表明，与现有的基于模的聚类算法和基于蒙特卡罗优化的其他非模算法相比，该算法有效地提高了聚类结果的质量．

英文摘要：

While categorical data are widely used in many applications such as Bioinformatics, clustering categorical data is a difficult task in the filed of statistical machine learning due to the characteristic of the data which can only take discrete values. Typically, the mainstream methods are dependent on the mode of the categorical attributes in order to optimize the clusters and weight the relevant attributes. A non-mode approach is proposed for statistically clustering of categorical data in this paper. First, based on a newly defined dissimilarity measure, an objective function with attributes weighting is derived for categorical data clustering. The objective function is defined based on the average distance between the objects and the clusters, therefore overcomes the problems in the existing methods based on the mode category. Then, a soft-subspace clustering algorithm is proposed for clustering categorical data. In this algorithm, each attribute is assigned with weights measuring its degree of relevance to the clusters in terms of the overall distribution of categories instead of the mode category, enabling automatic feature selection during the clustering process. Experimental results carried out on some synthetic datasets and real-world datasets demonstrate that the proposed method significantly improves clustering quality.

同期刊论文项目

面向软件行为鉴别的事件序列挖掘方法研究

期刊论文 53 会议论文 11 获奖 2 著作 1

同项目期刊论文

识别聚类间远近关系的双几何体模型

检测迷惑恶意代码的层次化特征选择方法

使用多分类器组合的只能反钓鱼架构

A Novel Hierarchical Clustering Algorithm for Gene Sequences

Combined New Nonnegative Matrix Factorization Algorithms with Two-dimensional Nonnegative Matrix Fac

Soft subspace clustering of categorical data with probabilistic distance

Projected-prototype based classifier for text categorization

基于网格最小生成树的聚类算法选择

一种新型协作多机器人路径规划算法

一种基于改进Theta* 的机器人路径规划算法

基于RSKNN 分类改进算法

商务处理模型的配置

基于词性标注序列特征提取的微博情感分类

一种匹配全局结构的图相似性度量

EM-type method for measuring graph dissimilarity

自适应熵的投影聚类算法

基于隐马尔科夫模型的DNA序列分类方法

优化子空间的高维聚类算法

A Novel Variable-order Markov Model for Clustering Categorical Sequences

A probabilistic framework for optimizing projected clusters with categorical attributes

规范化相似度的符号序列层次聚类

DNA序列的二阶隐马尔科夫模型分类

Nearest neighbor classification of categorical data by attributes weighting

SMwKnn:基于类别子空间距离加权的互K近邻算法

Malicious sequential pattern mining for automatic malware detection

软件代码的恶意行为学习与分类

基于符号熵的序列相似性度量

Kernel-based linear classification on categorical data

一种基于混合模型的数据流概念漂移检测算法

Image Processing using Newton-based Algorithm of Nonnegative Matrix Factorization

一种事件序列的加权变阶马尔可夫模型

Modeling and Analyzing Mixed Communications in Service-oriented Trustworthy Software

恶意软件鉴别技术及其应用

Analyzing Event-based Scheduling in Concurrent Reactive Systems

基于符号化聚合近似的时间序列相似性复合度量方法

基于新 Haar-like 特征的 Adaboost 人脸检测算法

一种基于Pareto排序的混合多目标进化算法

基于滑动窗口和蚁群优化算法的二次路径规划算法

类属数据的贝叶斯聚类算法

多维数据的聚类结果可视化技术综述

融合速度特征的压缩感知目标跟踪算法

基于簇间分离性的稀有类识别算法

具有多形态种群协同进化的多目标优化算法

云计算中服务虚拟的形式依赖分析

融合张角拥挤控制策略的高维多目标优化

期刊信息

《软件学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国科学院软件研究所中国计算机学会
主编：赵琛
地址：北京8718信箱中国科学院软件研究所
邮编：100190
邮箱：jos@iscas.ac.cn
电话：010-62562563

国际标准刊号：ISSN：1000-9825
国内统一刊号：ISSN：11-2560/TP
邮发代号:82-367

获奖情况:
2001年入选中国期刊方阵“双百期刊”,2000年荣获中国科学院优秀科技期刊一等奖

国内外数据库收录:
俄罗斯文摘杂志,美国数学评论（网络版）,波兰哥白尼索引,德国数学文摘,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:54609