东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于特征本体的文本流主题检测研究

ISSN号：1001-3695
期刊名称：计算机应用研究
时间：0
页码：-
分类：TP391.1[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]山西大学计算机与信息技术学院,太原030006, [2]同济大学电子与信息工程学院,上海201804
相关基金：基金项目：国家自然科学基金资助项目（61403238,61100138,61502288,71171148）：山西省自然科学基金资助项目（2014021022-1,2011011016-2）;山西省回国留学人员科研项目（2013-022）
相关项目：基于语义计算的高维复杂数据降维理论与实证研究

关键词：特征本体, 主题检测, 文本流, feature ontology, topic detection, text stream

中文摘要：

传统的主题检测方法以统计理论为基础，忽略了数据本身蕴涵的语义，带来了偏差严重、与样本数据高度相关等缺点。针对以上缺点，面向文本流数据，提出一种基于特征本体的主题检测方法。首先构建文本特征本体；其次，将较为复杂的文本特征本体看做是由若干主题组成的连通图，然后将主题连通图分解成单边图集合；再次，将主题相似度计算问题转换为单边图贡献度和图相似度的计算问题；最后，对每一批新文本集检测是否有新主题，从而使得主题的个数随着时间的推移而增加。在科技文献和新闻语料上进行实证研究，结果发现阈值6参数决定文本流中新主题出现的频率，且实验结果同经典主题模型基本保持一致。除此之外，同传统的方法相比，提出的方法能更好地支持主题的语义表示，且适用于流数据，能增量实现主题检测，在应用上具有更大的优势。

英文摘要：

Traditional topic detection methods mainly based on statistics, which ignoring the semantics of the data itself, and thus brought such shortcomings as serious deviation and highly dependency on sample data. Aiming at text stream, this paper put forward a novel topic detection approach based on text feature ontology. Firstly, it built text feature ontology. Secondly, complex text feature ontology could be seen as composed of several topics e.g. connected graph, which could then decomposed into unilateral graph collection. Again, the topic similarity computation problem could be cast into simple graph contribution and similarity calculation problem. Finally, for each batch of new text set it could see if there was a new topic, so that the number of topics would grow with time passed by. Empirical research on literature and news corpus was performed, and it was found that the threshold the delta parameter determines the frequency of new topics in text stream, and the results are almost consistent with the classical topic model. In addition, compared with the traditional method, the proposed approach can sup- port the semantic representation of a topic, and is suitable for the data stream, which can realize the online topic detection, and thus has more advantages in applications.

同期刊论文项目