东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于IB方法的无冗余多视角聚类

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：2013.9.15
页码：1865-1875
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]郑州大学信息工程学院,郑州450052
相关基金：国家自然科学基金项目（61i70223,61202207）;国家自然科学基金联合基金项目（U12046i0）
相关项目：可扩展迁移学习中跨媒体复杂问题自动映射研究

关键词：聚类, 无冗余多视角, IB方法, 互信息, 平均微分熵, clustering, non-redundant multi-view, information bottleneck （IB） method, mutualinformation l meanNN differential entropy

中文摘要：

针对数据中多视角模式挖掘的问题，提出一个基于IB方法的无冗余多视角聚类算法：NrMIB．该算法一方面采用IB思想来最大化地保存聚类结果中的信息量，以确保高质量的聚类结果；另一方面通过最小化聚类结果与已知数据划分模式间的互信息来确保新的聚类结果相对于已知划分模式是无冗余的．NrMIB算法既适宜于分析共现数据，又适宜于分析欧氏空间非共现数据，可挖掘出数据中线性及非线性可分模式，无需额外参数来估算欧氏空间的信息量．在人工构造数据模式识别、人脸识别和文档聚类上的实验结果表明，NrMIB算法可有效地挖掘出数据中所蕴含的多个合理划分模式，性能优于传统单视角聚类算法及3个现有的无冗余多视角聚类算法．

英文摘要：

Typical clustering algorithms output a single partition of the data. However, in real world applications, data can often be interpreted in many different ways and has different reasonable partitions from multiple views. Instead of committing to one clustering solution, here we introduce a novel algorithm, NrMIB （non-redundant multi-view information bottleneck）, which can provide several non-redundant clustering solutions from multiple views to the user. Our approach employs the information bottleneck fIB） method, which aims to maximize the relevant information preserved by clustering results, to ensure the qualities of the clustering solutions, whilst the mutual information between the clustering labels and the known data partitions is minimized to ensure that the new clustering solutions are non-redundant. By adopting the mutual information and MeanNN differential entropy to estimate the preserved information, the NrMIB can be used to analyze both co-occurrence data and Euclidean space data. Besides, our algorithm is also suitable to analyze high dimension data, and can discover both linear and non-linear cluster shapes. We perform experiments on synthetic data pattern recognition, face recognition, and document clustering to assess our method against a large range of clustering algorithms in the literature. The experimental results show that the proposed NrMIB algorithm can discover the multiple reasonable partitions resided in the data, and the performance of NrMIB is superior to three non-redundant multi-view clustering algorithms examined here.

同期刊论文项目