东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

Tag clustering algorithm LMMSK: improved K-means algorithm based on latent semantic analysis

ISSN号：1004-4132
期刊名称：《系统工程与电子技术：英文版》
时间：0
分类：V221.3[航空宇航科学与技术—飞行器设计] TB553[理学—物理;理学—声学]
作者机构：[1]北京航空航天大学经济管理学院,北京100083, [2]北京航空航天大学机械工程及自动化学院,北京100083, [3]东北大学工学院,波士顿02115, [4]国家计算机网络与信息安全管理中心,北京100029
相关基金：国家自然科学基金（71531001,71322104,71171007,71471009）; 国家“863”计划（SS2014AA012303）; 中央高校基本科研业务费专项资金

关键词：文本聚类, 分解-组合算法, 基于信息理论的一致性聚类, K-均值, 大数据聚类, text clustering , disassemble-assemble algorithm , information-theoretic consensus clustering , K-means, big data clustering

中文摘要：

尽管近年来针对文本聚类问题进行了大量研究,其仍然是数据挖掘领域的一个富有挑战性的问题,特别在弱相关特征乃至噪声特征的处理上,仍然存在诸多挑战。针对这一问题提出了文本聚类的分解-组合算法框架——DIAS。该方法首先通过简单随机特征抽样将高维文本数据进行分解得到多样化的结构知识,其优点是能够较好地避免产生大量的噪声特征。然后采用基于信息理论的一致性聚类（ICC）将多视角基础聚类知识组合起来,得到高质量的一致性划分。最后通过在8个真实文本数据集上的实验,证明DIAS算法相较于其他被广泛使用的算法具有明显优势,特别在处理弱基础聚类上具有突出效果。由于在分布式计算上的天然优势,DIAS有望成为大规模文本聚类的主流算法。

英文摘要：

Although being extensively studied, text clustering remains a critical challenge in data mining community due to the curse of dimensionality. Various techniques have been proposed to overcome this difficulty, but the negative impact of weakly related or even noisy features is yet the hunting nightmare. Meanwhile, we should never lose sight of the explosive growth of unlimited user-generated content on social media, which is extremely sparse and poses further challenge on the efficiency issue. In light of this, a disassemble- assemble （DIAS） framework is proposed for text clustering. Simple random feature sampling is employed by DIAS to disassemble high-dimensional text data and gain diverse structural knowledge by avoiding the bulk of noisy features. Then the multi-view knowledge is assembled by fast information-theoretic consensus clustering （ICC） to gain a high-quality consensus partitioning. Extensive experiments on eight real-world text data sets are conducted to demonstrate the advantages of DIAS over some widely used methods. In particular, DIAS shows appealing merits in learning from a bulk of very weak basic partitionings. Its natural suitability for dis- tributed computing makes DIAS become a promising candidate for big text clustering.

同期刊论文项目

基于协同标注的虚拟实践社区知识传播模型及社区演化范式研究

期刊论文 2

基于移动群智感知的物联网大数据挖掘与应用

期刊论文 2

同项目期刊论文

空间众包环境下的3类对象在线任务分配

Density-based rough set model for hesitant node clustering in overlapping community detection

期刊信息

《系统工程与电子技术：英文版》

主管单位:中国航天机电集团
主办单位:中国航天工业总公司二院
主编：高淑霞
地址：北京海淀区永定路52号
邮编：100854
邮箱：jseeoffice@126.com
电话：010-68388406 68386014

国际标准刊号：ISSN：1004-4132
国内统一刊号：ISSN：11-3018/N
邮发代号:82-270

获奖情况:
航天系统优秀期刊奖,美国工程索引（EI）和英国科学文摘（SA）收录

国内外数据库收录:
荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,美国科学引文索引（扩展库）,英国科学文摘数据库

被引量:242