东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

地质雷达数据中克里金插值采样数据选择算法

ISSN号：1006-7043
期刊名称：《哈尔滨工程大学学报》
时间：0
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]南京邮电大学计算机学院,江苏南京210003, [2]东南大学计算机网络和信息集成教育部重点实验室,江苏南京211189
相关基金：国家自然科学基金重点项目（61003040,61100135,61302157）

关键词：话题发现, 向量空间模型, 隐含语义分析, 文本聚类, 奇异值分解, topic detection, vector space model, latent semantic analysis, text clustering, singular value decomposition

中文摘要：

互联网的飞速发展和海量数据的不断增长，使得如何快速、有效地识别当前新闻热点信息成为迫切需求。在线新闻话题发现已成为当前研究热点。对于在线环境下的新闻文本特征表示，传统向量空间模型随着数据的增长向量维度不断增长，使得数据稀疏和同名异议问题愈加明显，导致文本相似度难以准确度量。使用基于特征加权的隐含语义分析将高维、稀疏的词一文档矩阵映射到隐藏的k维语义空间，充分挖掘词、文档之间的语义信息，以提高同主题文档间的语义相似度，克服在线环境下文本稀疏性和同名异议问题。此外，对于不断增长的大规模新闻数据，传统聚类算法存在时间复杂度过高或者输入依赖等问题，难以快速、有效地得到理想结果。基于新闻报道在时间上的顺序性和相关性，提出改进的Single—pass在线增量聚类算法检测话题类，并引入话题热度值的概念来筛选当前关注度较高的热点话题。实验结果表明，该方法能够有效提高话题检测的准确率，实现基于真实新闻数据集的在线话题捕捉。

英文摘要：

With the rapid development of the Internet and the continuous increasing of massive data, how to identify the current news topic quickly and effectively is becoming an urgent demand, and online hot news topic detection has become an hot area of research. For online news stream, the degree of traditional Vector Space Model （VSM） will grow with the increasing of data, resulting in obvious problem of data sparsity and synonymy, which makes it difficult to quickly and accurately calculate the similarity of texts. The latent semantic analysis based on weighted features is used to map the sparse matrix with high-dimension of words and documents to the hidden k-dimension se- mantic space, making full use of the semantic information between words and documents to improve the semantic similarity between the same subject documents, overcoming the problems of text sparsity and synonymy in Intemet. In addition, traditional clustering algorithm exists the problem of high time complexity and input dependency for increasing massive news data, which is difficult to get the expected result quickly and efficiently. A Single-pass online clustering algorithm is used to detect the topic clusters based on succession and corre- lation in time for news, and the concept of topic heat is introduced to screen the public attention of news topics. Experiment shows that the method proposed can effectively improve the accuracy of the detection of topics.

同期刊论文项目

基于几何覆盖方法的半监督聚类算法研究

期刊论文 11

同项目期刊论文

基于区间证据理论的多传感器数据融合水质判断方法

一种采用社团信息的链接预测方法

一种改进的加权网络链接预测方法

改进粒子群算法优化的支持向量机及其应用

基于字串切分统计词典的繁体中文拼写检错方法

基于分层选择策略的主动学习分词方法

基于密度与最小距离的K-means算法初始中心方法

概率潜在语义分析的KNN文本分类算法

Novel Apriori-Based Multi-Label Learning Algorithm by Exploiting Coupled Label Relationship

Coupled Attribute Similarity Learning on Categorical Data for Multi-Label Classification

期刊信息

《哈尔滨工程大学学报》
中国科技核心期刊

主管单位:中华人民共和国工业和信息化部
主办单位:哈尔滨工程大学
主编：杨士莪
地址：哈尔滨市南岗区南通大街145号1号楼
邮编：150001
邮箱：xuebao@hrbeu.edu.cn
电话：0451-82519357

国际标准刊号：ISSN：1006-7043
国内统一刊号：ISSN：23-1390/U
邮发代号:14-111

获奖情况:
工信部科技期刊评比"优秀期刊奖",中国高校科技期刊评比"精品期刊奖","北方十佳期刊奖",首届黑龙江省政府出版奖--优秀期刊奖

国内外数据库收录:
俄罗斯文摘杂志,美国化学文摘（网络版）,美国数学评论（网络版）,波兰哥白尼索引,德国数学文摘,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）

被引量:11823