东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

几种文本特征降维方法的比较分析

ISSN号：1002-8331
期刊名称：《计算机工程与应用》
时间：0
分类：TP183[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]天津大学系统工程研究所,天津300072, [2]上海海事大学计算机系,上海200135
相关基金：国家自然科学基金资助项目（编号：60275020）;上海市教委科研项目（编号：04FB22）;上海海事大学重点学科建设项目（编号：XL0101）

关键词：文本挖掘, 降维, 随机映射, 非负矩阵分解, 概念索引, 隐含语义分析, text mining,dimension reduction,RP,NMF,CI,LSA

中文摘要：

文本挖掘中采用向量空间模型（VSM）来表达文本特征,表现出巨大的维数,从而导致处理过程计算复杂,为此,需要先对文本特征矩阵进行合理的降维处理.隐含语义分析（LSA）、概念索引（CI）、非负矩阵分解（NMF）和随机映射（RP）是几种有效的降维方法,在分析降维空间的含义和计算复杂度后,通过文本聚类实验比较和分析了这几种降维方法的差异,实验表明,这些方法不仅可以对文本特征空间作有效的降维处理,还能在不同程度上凸现文本和词条之间的语义关系,从而提高文本挖掘的效率和准确率.

英文摘要：

Vector Space Model is usually used to express text feature in data mining.Text feature matrix has large dimensionality,and leads to complex computation.So it is needed to reduce dimensionality of text feature matrix before mining data.Latent Semantic Analysis,Concept Indexing,Non-negative Matrix Factorization and Random Projection are some dimension reduction methods.After comparing and analyzing the meanings of the reduced space,the computing complexity and their differences,experiments demonstrate these methods not only can reduce dimensionality effectively, but also open out the semantic relations between text and term and improve mining efficiency and accuracy.

同期刊论文项目